pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-23 06:34:55 +08:00

Author	SHA1	Message	Date
Jane Xu	1eb6146d96	Add manual simple retry to ECR login (#71287 ) Summary: Current retry with AWS_MAX_ATTEMPTS does not seem to work as we still get failures https://github.com/pytorch/pytorch/runs/4806177738?check_suite_focus=true This should hopefully alleviate Pull Request resolved: https://github.com/pytorch/pytorch/pull/71287 Reviewed By: malfet, seemethere Differential Revision: D33573788 Pulled By: janeyx99 fbshipit-source-id: 300fde9a9fa5a2da3e9d18b7989a3676500d8011	2022-01-18 10:56:53 -08:00
Peter Bell	2bb6a4f437	Generate aten_interned_strings.h automatically (#69407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69407 This generates aten_interned_strings.h from `native_functions.yaml` which is more like how it was originally done. The items deleted from `interned_strings.h` are duplicates that need to be removed in order for the code to compile, some of the remaining items may still be out of date but it is fairly benign even if that's the case. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32923636 Pulled By: albanD fbshipit-source-id: a0fd6b3714e70454c5f4ea9b19da5e047d2a4687	2022-01-18 08:29:54 -08:00
Michael Dagitses	d665097cad	allow Bazel to build without glog and gflags (#70850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70850 We support both, so we want to ensure both continue to work. ghstack-source-id: 146960552 Test Plan: Tested manually. A subsequent diff adds this test configuration to CI. Reviewed By: malfet Differential Revision: D33297464 fbshipit-source-id: 70e1431d0907d480c576239af93ef57036d5e4d7	2022-01-18 08:08:46 -08:00
Michael Dagitses	ffdc6b4994	extract //c10/macros to its own package (#70849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70849 ghstack-source-id: 146960563 Test Plan: Bazel CI tests will protect this. Reviewed By: malfet Differential Revision: D33297235 fbshipit-source-id: 6504a977e82ad2f2232a74233b96cdea8bf94a20	2022-01-18 08:08:42 -08:00
Michael Dagitses	8d0e354191	fix CAFFE2_BUILD_MAIN_LIB to the correct C10_BUILD_MAIN_LIB (#70848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70848 This is the C10 library, it that's the main lib we are building here. While here, use `local_defines` instead of `copts` for this definition. Both `copts` and `local_defines` only apply to the compilation units in the library, and not transitively. ghstack-source-id: 146998039 Test Plan: We are relying on CI to verify this doesn't cause any problems. Reviewed By: malfet Differential Revision: D33429420 fbshipit-source-id: b3fc84c0588bd43346e3f9f77e851d293bde9428	2022-01-18 08:05:20 -08:00
Erjia Guan	fd9e08df5d	Make Demux serializable with lambda function (#71311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71311 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33584552 Pulled By: ejguan fbshipit-source-id: 52324faf5547f9f77582ec170ec91ce3114cfc61	2022-01-18 06:47:54 -08:00
CodemodService FBSourceClangFormatLinterBot	f0db15122f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33629127 fbshipit-source-id: 47befcd98cfa544a4d822161d8bfbe8d7a788e4d	2022-01-18 01:50:08 -08:00
Mike Ruberry	d17f340a2e	The Cacherator (#71350 ) Summary: This PR adds a persistent filesystem cache for jitted kernels. The cache is disabled on Windows because it relies on POSIX headers. The cache writes, by default, to `~/.cache/torch/kernels`, but the location can be controlled by setting the `PYTORCH_KERNEL_CACHE_PATH`. A separate environment variable, `USE_PYTORCH_KERNEL_CACHE`, will disable all caching logic when set to zero. The use of a persistent fileystem cache dramatically lowers the "first call time" for an operator AFTER its has been compiled, because it skips (most of) the jit compilation process. On systems where we're compiling only to ptx that ptx still has to be just-in-time compiled by the driver API, so an additional latency of around 10 milliseconds is expected at first call time. On systems which compile to SASS the additional first call time latency is about one millisecond. This compares with times of 150 milliseconds+ for just-in-time kernel compilation. Files in the cache use a mostly human readable string that includes an SHA1 hash of the CUDA C string used to generate them. Note that this is not an SHA1 hash of the file's contents, because the contents are the compiled ptx or SASS. No verification is done when the file is loaded to ensure the kernel is what's expected, but it's far more likely you'll be struck by a meteor than observe two file names conflict. Using SHA1 hashes to generate unique ids this way is a common practice (GitHub does it, too). This cache design could be reused by other fusion systems and should allow us to jiterate more operations without fear of regressing the "incremental development" scenario where users are tweaking or extending programs slightly, rerunning then, and then repeating that process again and again. Without a cache each run of the program would have to recompile every jitted kernel, but with this cache we expect a negligible impact to the user experience. cc kshitij12345, xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71350 Reviewed By: ngimel Differential Revision: D33626671 Pulled By: mruberry fbshipit-source-id: d55df53416fbe46348623846f699f9b998e6c318	2022-01-17 23:52:14 -08:00
Peter Bell	7b9fff90d2	empty_generic: Remove redundant device argument (#70612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70612 The device information is embedded in the `DataPtr` returned from the allocator, so this argument is completely ignored. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33623681 Pulled By: ngimel fbshipit-source-id: bea64707bb17d46debb0ed7c1175493df56fee77	2022-01-17 20:18:43 -08:00
Ivan Yashchuk	f93ffc9ea8	Sparse CSR: Handle zero matrix consistently for triangular_solve (#71304 ) Summary: This PR enables `test_block_triangular` tests on the CPU. These tests revealed that there was a problem with how the nnz==0 case is handled. Now we return a tensor filled with NaNs both on CUDA and CPU. cc nikitaved pearu cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/71304 Reviewed By: davidberard98 Differential Revision: D33600482 Pulled By: cpuhrsch fbshipit-source-id: d09cb619f8b6e54b9f07eb16765ad1c183c42487	2022-01-17 13:47:49 -08:00
Nolan O'Brien	17540c5c80	[warnings][Caffe2] Suppress warnings in non-c10 headers (#71370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71370 Round out suppressing warnings in `caffe2` headers Test Plan: CI check Reviewed By: r-barnes Differential Revision: D33613084 fbshipit-source-id: 9306d480bd796aeae4d887ad26b6ddc2c571c9e4	2022-01-17 10:09:31 -08:00
Nolan O'Brien	cf47338191	[Caffe2][warnings] Suppress -Wimplicit-int-float-conversion in TypeSafeSignMath.h for clang (#71369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71369 Suppress `-Wimplicit-int-float-conversion` in `TypeSafeSignMath.h` when building with clang Test Plan: CI check Reviewed By: r-barnes Differential Revision: D33612983 fbshipit-source-id: cff1239bc252d4a2f54a50a2bbcd48aeb8bf31ca	2022-01-17 10:05:21 -08:00
Xu Zhao	ddf97a59ca	Remove the dependency of pytorch nightly. (#71323 ) Summary: This PR removes the PyTorch nightly dependencies of TorchBench CI. Instead, it relies on the bisection script to install TorchBench dependencies (https://github.com/pytorch/benchmark/pull/694). This will unblock TorchBench CI users when the nightly build fails (e.g., https://github.com/pytorch/pytorch/issues/71260) RUN_TORCHBENCH: resnet18 TORCHBENCH_BRANCH: xz9/optimize-bisection Pull Request resolved: https://github.com/pytorch/pytorch/pull/71323 Reviewed By: wconstab Differential Revision: D33591713 Pulled By: xuzhao9 fbshipit-source-id: f1308ea33ece1f18196c993b40978351160ccc0c	2022-01-17 09:52:36 -08:00
Nolan O'Brien	a383d01774	[fbcode][warnings] Suppress warnings in caffe2/c10 (#71356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71356 Suppress remaining header based warnings in `caffe2/c10` when building with `clang` Test Plan: CI pass Reviewed By: r-barnes Differential Revision: D33600097 fbshipit-source-id: e1c0d84a0bad768eb03e047d62b5379cf28b48e2	2022-01-15 18:34:08 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	1ecfa1d61a	Load zip file in deploy interpreter (#71072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71072 This PR replaces the old logic of loading frozen torch through cpython by directly loading zipped torch modules directly onto deploy interpreter. We use elf file to load the zip file as its' section and load it back in the interpreter executable. Then, we directly insert the zip file into sys.path of the each initialized interpreter. Python has implicit ZipImporter module that can load modules from zip file as long as they are inside sys.path. Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy Reviewed By: shunting314 Differential Revision: D32442552 fbshipit-source-id: 627f0e91e40e72217f3ceac79002e1d8308735d5	2022-01-15 14:39:59 -08:00
Jerry Zhang	08d8f81704	[quant][fix][fx][graphmode] Fix qconfig setting for fused modules (#71254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71254 when we configure linear and relu with the same qconfig, we currently have utility functions to also generate a qconfig for the fused linear relu module, but this code is not called in correct order before which resulted in unexpected behaviors. This PR fixes the issue. Please see test case for more details. (Test case is from Supriya) Test Plan: python test/test_quantization.py TestQuantizeFx.test_fused_module_qat_swap Imported from OSS Reviewed By: supriyar Differential Revision: D33558321 fbshipit-source-id: d95114dc4b77264e603c262c2da02a3de4acba69	2022-01-14 23:31:11 -08:00
Lucian Grijincu	bb49352354	caffe2/torch/csrc/jit/frontend/tree_views: workaround nvcc compiler error Test Plan: Move it outside the header so it's not seen by nvcc ``` $ buck2 build -c fbcode.platform=platform010 fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda Downloading buck2... [======================================================================] watchman fresh instance event, clearing cache Using disallowed linker flag 'arvr/third-party/toolchains/platform009/build/mesa/lib/libGL.so' in library rule 'fbsource//third-party/toolchains:opengl' Using disallowed linker flag 'arvr/third-party/freeglut/3.0.0/libs/x64-linux/libglut.a' in library rule 'fbsource//third-party/toolchains:GLUT' Action Failed for fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda (ovr_config//platform/linux:x86_64-fbcode-platform010-clang-6dbc4bb1b9a32829)#5: cxx_compile ngram_repeat_block_cuda_kernel.cu (pic) failed with non-zero exit code 1 debug information: action_digest=b2bda91d24dad53e960c740ef9a412cee1902d86:94 stdout: stderr: fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Property>]': fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:117: required from here fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Property>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'} 220 \| return Maybe<T>(Compound::create(TK_OPTION, range, {value})); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note: initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)' 143 \| const SourceRange& range_, \| ~~~~~~~~~~~~~~~~~~~~~~~~ 144 \| TreeList&& trees_) { \| ^ fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Assign>]': fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:171: required from here fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Assign>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'} 220 \| return Maybe<T>(Compound::create(TK_OPTION, range, {value})); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note: initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)' 143 \| const SourceRange& range_, \| ~~~~~~~~~~~~~~~~~~~~~~~~ 144 \| TreeList&& trees_) { \| ^ cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics command: buck-out/v2/gen/fbcode/999b02f9444004c1/tools/build/__wrap_nvcc.py__/wrap_nvcc.py -_NVCC_BIN_ fbcode ...<omitted>... ors/pytorch/lib/cuda/__ngram_repeat_block_cuda__/__objects__/ngram_repeat_block_cuda_kernel.cu.pic.o (rerun with -v to view the untruncated command) ``` Reviewed By: zhxchen17 Differential Revision: D33592885 fbshipit-source-id: a36dcb3c8265d009b2287f0a479695d1ddbf85aa	2022-01-14 21:58:31 -08:00
Lucian Grijincu	4bf1be898d	caffe: fix warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction" Summary: Need to bring in all signatures https://www.internalfb.com/code/fbsource/[36035b9e4e41813e215ffd5f4377d65b7259237e]/fbcode/caffe2/aten/src/ATen/core/function.h?lines=91-101 Test Plan: ``` Action Failed for fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda (ovr_config//platform/linux:x86_64-fbcode-platform010-clang-6dbc4bb1b9a32829)#5: cxx_compile ngram_repeat_block_cuda_kernel.cu (pic) failed with non-zero exit code 1 debug information: action_digest=988629a726bc4eabcaf334db2317a969958d5fd2:94 stdout: stderr: fbcode/caffe2/torch/csrc/jit/api/function_impl.h(11): warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction" fbcode/caffe2/torch/csrc/jit/api/function_impl.h(11): warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction" fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Property>]': fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:117: required from here fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Property>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'} 220 \| return Maybe<T>(Compound::create(TK_OPTION, range, {value})); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note: initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)' 143 \| const SourceRange& range_, \| ~~~~~~~~~~~~~~~~~~~~~~~~ 144 \| TreeList&& trees_) { \| ^ fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Assign>]': fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:171: required from here fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Assign>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'} 220 \| return Maybe<T>(Compound::create(TK_OPTION, range, {value})); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note: initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)' 143 \| const SourceRange& range_, \| ~~~~~~~~~~~~~~~~~~~~~~~~ 144 \| TreeList&& trees_) { \| ^ cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics command: buck-out/v2/gen/fbcode/999b02f9444004c1/tools/build/__wrap_nvcc.py__/wrap_nvcc.py -_NVCC_BIN_ fbcode ...<omitted>... ors/pytorch/lib/cuda/__ngram_repeat_block_cuda__/__objects__/ngram_repeat_block_cuda_kernel.cu.pic.o (rerun with -v to view the untruncated command) ``` Differential Revision: D33579670 fbshipit-source-id: 9acb443732feb3e921ce0fa5f38f21ed44f64114	2022-01-14 20:27:09 -08:00
Nikita Shulga	3ed27a96ed	[BE] Refactor repetitions into TorchVersion._cmp_wrapper` (#71344 ) Summary: First step towards https://github.com/pytorch/pytorch/issues/71280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71344 Reviewed By: b0noI Differential Revision: D33594463 Pulled By: malfet fbshipit-source-id: 0295f0d9f0342f05a390b2bd4aa0a5958c76579b	2022-01-14 19:57:55 -08:00
Scott Wolchok	c43e0286a9	[PyTorch][Lazy] Make hashing null optionals cheap (#71290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71290 The existing code called an out-of-line hash function on a constant. This is just going to get the same random-looking 64-bit integer every time, so I just changed the constant to an integer I generated with `hex(random.randint(0x1000000000000000, 0xFFFFFFFFFFFFFFFF))` to get the same effect but without the runtime hashing. ghstack-source-id: 146991945 Test Plan: CI Reviewed By: wconstab Differential Revision: D33574676 fbshipit-source-id: d6ce1e1cc0db67dfede148b7e3173508ec311ea8	2022-01-14 17:13:50 -08:00
Zhengxu Chen	a138aad6e6	[jit][edge] Return a no-op nullptr for UnionType on mobile for backward compatibility. (#71341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71341 Old models containing UnionType need to be loaded even if they don't actually use Unions. This is not the best solution, we need to catch this error on the compiler side instead, but before doing that we can land this first to at least mitigate model loading crash issues. ghstack-source-id: 147056684 Test Plan: CI Verified with jaebong on his device locally. Differential Revision: D33593276 fbshipit-source-id: fac4bc85c652974c7c10186a29f36e3e411865ad	2022-01-14 17:06:13 -08:00
kshitij12345	b7222e15b6	[fix] max_pool1d: composite compliance (#70900 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/69991 Not sure if this is a good idea as this increases the number of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70900 Reviewed By: wenleix Differential Revision: D33585964 Pulled By: zou3519 fbshipit-source-id: 11bfa2e00ee123a6d36f7d4cccdf0c1a3e664d8c	2022-01-14 15:36:27 -08:00
Scott Wolchok	fcbc34a5eb	[PyTorch][Static Runtime] Avoid recomputing input size in dict_unpack (#71252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71252 Same old problem, same old solution. Interestingly, I tried using c10::irange instead, but that caused really bad assembly to be generated -- we lost inlining for lots of the loop body! ghstack-source-id: 146939573 Test Plan: CI Spot-checked assembly before/after and confirmed that loop termination value was recomputed before and not after Reviewed By: mikeiovine Differential Revision: D33558118 fbshipit-source-id: 9fda2f1f89bacba2e8b5e61ba432871e973201fe	2022-01-14 14:33:56 -08:00
Scott Wolchok	bf82d2012e	[PyTorch] Add IValue::toDimVector & mostly replace toIntVector with it (#71247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71247 Most uses of toIntVector() were for a Tensor shape. We have DimVector to avoid heap allocations in those cases, so let's use it. ghstack-source-id: 146933314 Test Plan: CI -- if we think DimVector is good in general then I think we have to think this change is good? Reviewed By: mikeiovine Differential Revision: D33556198 fbshipit-source-id: cf2ad92c2d0b99ab1df4da0f6843e6ccb9a6320b	2022-01-14 14:32:40 -08:00
Nikita Shulga	94ed61eb5c	Pin numba to 0.54.1 (#71327 ) Summary: Not sure what is going on, but numba=0.55.0 currently installed in for example 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9:0d18ad2827487386d2a7864b11fec5bc83de6545 is build against newer version of numpy, which was apparently silently fixed on the pypi side (as latest numba download is numba-0.55.0-1-cp37-cp37m-manylinux2014_x86_64.manylinux_2_17_x86_64.whl ) Fixes https://github.com/pytorch/pytorch/issues/71320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71327 Reviewed By: suo, seemethere, atalman, janeyx99 Differential Revision: D33589002 Pulled By: malfet fbshipit-source-id: d362a2b2fd045bc1720cd7fdc4c7b18b7d607fc4	2022-01-14 14:06:15 -08:00
Santiago Castro	d74bb42f7a	Add a missing precondition to `DistributedSampler` docstring (#70104 ) Summary: Distributed sampler sets different indices for different processes. By doing this, it assumes that the data is the same across the board and in the same order. This may seem trivial, however, there are times that users don't guarantee the order items are gonna have, because they rely on something such as the order the filesystem lists a directory (which is not guaranteed and may vary on different computers), or the order a `set` is iterated. I think it's better to make it clearer. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70104 Reviewed By: bdhirsh Differential Revision: D33569539 Pulled By: rohan-varma fbshipit-source-id: 68ff028cb360cadaee8c441256c1b027a57c7089	2022-01-14 13:55:12 -08:00
Jerry Zhang	2faccc2f5d	[quant] Remove some redundant entries in backend_config_dict for TensorRT (#70971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70971 "root_module" and "reference_quantized_module_for_root" are only used in convert, removed them for fused module and qat module swapping configurations We may be able to remove some other fields as well. Test Plan: python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps Imported from OSS Reviewed By: andrewor14 Differential Revision: D33470739 fbshipit-source-id: 67e6d58d7a3ec9fbd8c13527e701c06119aeb219	2022-01-14 12:43:25 -08:00
Nikita Shulga	d793cc1993	Revert "Pin numba ot 0.54.1" This reverts commit ac7f188c64805f2f9dd134f5781d3b584688e677 that was landed accidentally.	2022-01-14 12:32:39 -08:00
Nikita Shulga	ac7f188c64	Pin numba ot 0.54.1 As newer one is incompatible with numpy version we are using Fixes https://github.com/pytorch/pytorch/issues/71320	2022-01-14 12:25:47 -08:00
Jiewen Tan	680d61daab	[LT] Remove torch::lazy::convertShapes (#71291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291 This commit removes torch::lazy::convertShapes since it's no longer used. In addition, it replaces a numel logic within LTCTensorImpl. Test Plan: ./build/bin/test_lazy CI in lazy_tensor_staging branch Reviewed By: wconstab Differential Revision: D33575084 Pulled By: alanwaketan fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6	2022-01-14 12:06:39 -08:00
francescocastelli	c7d1501e4d	fractional_maxpool3d: port to structured kernel (#70414 ) Summary: Port fractional maxpool 3d to structured kernel Fixes https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70414 Reviewed By: zdevito, wenleix Differential Revision: D33572110 Pulled By: bdhirsh fbshipit-source-id: 1f89eb511335f51cc7abbb0230e165da8752f9fc	2022-01-14 12:01:16 -08:00
Jake Tae	a4196a9abf	Remove unused `optimizers` variable in test (#70668 ) Summary: In `TestLRScheduler._test()`, an unused variable `optimizers` is created. This PR is a minor refactoring that removes the variable and the loop block that populates the set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70668 Reviewed By: wenleix Differential Revision: D33586236 Pulled By: albanD fbshipit-source-id: cabf870a8221f144df9d3e2f2b564cdc5c255f5a	2022-01-14 11:59:49 -08:00
mingfeima	054b90f0d6	add channels last support for ChannelShuffle (#50247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50247 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26007052 Pulled By: VitalyFedyunin fbshipit-source-id: 08f737d64a65791c8002ffd56b79b02cf14d6159	2022-01-14 11:55:21 -08:00
Ilya Persky	e531646955	Fix docstring for nn.MultiHeadAttention (#71100 ) Summary: Fixes nn.MultiHeadAttention's docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71100 Reviewed By: mruberry Differential Revision: D33531726 Pulled By: albanD fbshipit-source-id: d2aa8fa44d0f6b166a809b7e5ceee26efcbccf36	2022-01-14 10:29:18 -08:00
Peter Bell	17bb68618f	Copy: Fix CPU transpose path ignoring neg and conj bits (#69026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69026 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33064533 Pulled By: anjali411 fbshipit-source-id: 98c25586a1707ac2324f69f652ce5a14dd59c0ad	2022-01-14 10:13:33 -08:00
mingfeima	84b1c9798c	add BFloat16 support for AvgPool2d on CPU (#66927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66927 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33353198 Pulled By: VitalyFedyunin fbshipit-source-id: 1aeaa4bb90ac99210b8f6051c09d6995d06ce3a1	2022-01-14 07:59:10 -08:00
CodemodService FBSourceClangFormatLinterBot	88012c7daf	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33577744 fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9	2022-01-14 06:10:57 -08:00
Mike Ruberry	3a0c680a14	Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295 ) Summary: Per title. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295 Reviewed By: ngimel Differential Revision: D33575885 Pulled By: mruberry fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b	2022-01-13 23:58:51 -08:00
SeanTasker	d068849cc0	- Fixed memory leak in ir_simplifier.cpp (#71285 ) Summary: The leak was causing long running inference loops to exhaust system memory. I tracked down the issue and noted that ModRound can be copied by value without worrying about a performance hit. I originally branched from release/1.10 and made these changes. This commit includes the same changes but from master as requested in the original PR https://github.com/pytorch/pytorch/pull/71077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71285 Reviewed By: wenleix Differential Revision: D33575821 Pulled By: ZolotukhinM fbshipit-source-id: 64333f6cbb2c222f05481499c9cae4c7e0116af6	2022-01-13 22:29:06 -08:00
mingfeima	910c01020e	add BFloat16 support for AdaptiveMaxPool2d on CPU (#66929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66929 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33353199 Pulled By: VitalyFedyunin fbshipit-source-id: d402d5deb7ca766259ca42118ddc16625e134c4c	2022-01-13 20:00:42 -08:00
Sameer Deshmukh	9e45c89891	remove skips from determinant tests (#70034 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67512. The accuracy requirement for non-contiguous inputs when using complex64 was too high so I reduced it to upto 1e-3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70034 Reviewed By: anjali411 Differential Revision: D33530382 Pulled By: mruberry fbshipit-source-id: 057daf75dc5feca5bb2f4428922eb7489435da60	2022-01-13 19:13:28 -08:00
Nikita Shulga	356af8f857	Do not use `ssize_t` in `python_arg_parser.[cpp\|h]` (#71250 ) Summary: Use `Py_ssize_t` when calling Python API Use `c10::irange` to automatically infer loop type Use `size_t` or `unsigned` for unsigned type Partially addresses https://github.com/pytorch/pytorch/issues/69948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71250 Reviewed By: atalman Differential Revision: D33569724 Pulled By: malfet fbshipit-source-id: c9eb75be9859d586c00db2f824c68840488a2822	2022-01-13 19:10:30 -08:00
code-review-doctor	675acfc1f4	Remove unwanted comma (#71193 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71193 Reviewed By: ngimel Differential Revision: D33542841 Pulled By: mruberry fbshipit-source-id: 0f2f1218c056aea7ecf86ba4036cfb10df6e8614	2022-01-13 19:09:05 -08:00
Jake Tae	558622642b	Fix `torch.dsplit` docs dim specification (#70557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70445. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70557 Reviewed By: ngimel Differential Revision: D33542864 Pulled By: mruberry fbshipit-source-id: c3a7929bfcd964da99225ad715f4546f1fc8002a	2022-01-13 19:04:51 -08:00
Zhengxu Chen	5f2b4be3b9	[jit] Split DynamicType conformance test into smaller pieces. (#71275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275 Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel. ghstack-source-id: 146990608 Test Plan: ``` [zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture' Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815) ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332) Summary Pass: 10 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748 ``` Reviewed By: qihqi Differential Revision: D33570442 fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa	2022-01-13 18:22:55 -08:00
BowenBao	81f693d509	[ONNX] minor clarifications of docstrings (#69260 ) (#69549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69549 [ONNX] minor clarifications of docstrings 1. Make description of ONNX_ATEN_FALLBACK more accurate (after #67460). 2. Specify minimum and maximum values for opset_version. This is pretty important information and we should make users dig through source code to find it. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32994267 Pulled By: msaroufim fbshipit-source-id: ba641404107baa23506d337eca742fc1fe9f0772	2022-01-13 18:03:27 -08:00
Han Qi	d555d3f0d0	Update generated header to use flatbuffer v1.12; (#71279 ) Summary: Update generated header to use flatbuffer v1.12; Also pin flatbuffer repo to v1.12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71279 Test Plan: unittest Fixes #ISSUE_NUMBER Reviewed By: gmagogsfm Differential Revision: D33572140 Pulled By: qihqi fbshipit-source-id: 319efc70f6c491c66a3dfcd7cad1f7defe69916b	2022-01-13 17:23:30 -08:00
Charles David Hernandez	e47771cca0	[ao] Removing unused allow list arguments from propagate_qconfig and helper (#71104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71104 This shouldn't change any functionality given that those variables were not used. It should be noted that a similar variable is used in add_observer which is why it wasn't removed from there. ghstack-source-id: 146940043 Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: vkuzo Differential Revision: D33510352 fbshipit-source-id: c66ed72c2b71a6e1822f9311467adaa1f4b730d0	2022-01-13 16:07:29 -08:00
Terry Chen	e7c87e8b44	[quant] fix dropout in FX graph mode quantization (#71043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71043 fix issue #68250 dropout break fx graph model quantization Test Plan: python test/test_quantization.py TestStaticQuantizedModule Imported from OSS Reviewed By: vkuzo Differential Revision: D33490176 fbshipit-source-id: 155546505b28ffc635ada65a1464b9d622dbc235	2022-01-13 15:59:59 -08:00
Jake Tae	eac3decf93	ModuleList concatenation (#70887 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70441. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70887 Reviewed By: ejguan Differential Revision: D33555431 Pulled By: albanD fbshipit-source-id: ce42459ee46a611e98e89f02686acbac16b6b668	2022-01-13 15:31:07 -08:00
kshitij12345	2981534f54	[nn] cross_entropy: no batch dim support (#71055 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71055 Reviewed By: anjali411 Differential Revision: D33567403 Pulled By: jbschlosser fbshipit-source-id: 4d0a311ad7419387c4547e43e533840c8b6d09d8	2022-01-13 14:48:51 -08:00
John Clow	e4d522a3cf	More informative messages for None types comparisons (#69802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69802 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555886 Pulled By: Gamrix fbshipit-source-id: 3045cbe04de22f05db41a99ad3dda90c5271aa0f	2022-01-13 13:59:28 -08:00
John Clow	ed9804088a	Adding support for loops (#70209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70209 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555889 Pulled By: Gamrix fbshipit-source-id: f6c0c9d517849e3679e07ac1c8cf3bf367e91882	2022-01-13 13:59:25 -08:00
John Clow	18d91a97e4	Adding custom device type change rules (#69051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69051 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555884 Pulled By: Gamrix fbshipit-source-id: c38812277d0e2aa008903a4328cb72e34bc6e1e6	2022-01-13 13:59:21 -08:00
John Clow	03c4d2b9e3	Adding support for Ifs in Device Type Analysis (#69050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69050 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555887 Pulled By: Gamrix fbshipit-source-id: f7f057c5985f8b6e7a9fe5702a944b2b4cc4d5b5	2022-01-13 13:59:18 -08:00
John Clow	4a8aa971cc	Building a TensorProperty AbstractBaseClass (#71184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71184 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555890 Pulled By: Gamrix fbshipit-source-id: 694f7b5327b93257010b0abeed3310b0b816c0a8	2022-01-13 13:59:15 -08:00
John Clow	dabcbb2726	Testing for Default Inference for Device Type (#69052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69052 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555888 Pulled By: Gamrix fbshipit-source-id: dbd43ebfc1bea4b17a96bdd378ea730ccf5944b2	2022-01-13 13:59:12 -08:00
John Clow	ade83ed90c	Building Default Inference for Device Type (#69049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69049 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555885 Pulled By: Gamrix fbshipit-source-id: 7364066cbc544ab8442a47c82ea89f0e73eaaa06	2022-01-13 13:57:08 -08:00
Jordan Fix	b64946cbc1	[acc_normalizer] Delete is_wrapped after normalization (#71046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71046 att Test Plan: Added test coverage. yinghai verifying locally for issue. Reviewed By: kflu, 842974287 Differential Revision: D33487868 fbshipit-source-id: 5da615f66f50500b30bae84592859305b2971e1e	2022-01-13 13:33:01 -08:00
Shintaro Iwasaki	71b274d34d	[pytorch] move ATen/CUDAGeneratorImpl.h to ATen/cuda (#71224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71224 Pull Request resolved: https://github.com/facebookresearch/FBTT-Embedding/pull/19 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/860 This patch follows up D33414890 (`5cae40c169`). This patch removes an alias header "`ATen/CUDAGeneratorImpl.h`" since it has been moved to `ATen/cuda/CUDAGeneratorImpl.h`. This change should have already been propagated. Test Plan: Internal and external CI Reviewed By: jianyuh Differential Revision: D33534276 fbshipit-source-id: 368177784ec84f003aad911cf4dd4da4a6e8e3d4	2022-01-13 13:29:44 -08:00
Nikita Shulga	1de830a985	Use `ptrdiff_t` rather than `ssize_t` (#71271 ) Summary: `diff_type` kind of naturally should be `ptrdiff_t`, as `ssize_t` is actually defined [here](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html) as : > The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71271 Reviewed By: atalman Differential Revision: D33569304 Pulled By: malfet fbshipit-source-id: 57dafed5fc42a1f91cdbed257e76cec4fdfbbebe	2022-01-13 12:41:53 -08:00
Charles David Hernandez	83b45fe166	[ao] disabling dynamic conv/convT ops (#71110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71110 as mentioned in https://github.com/pytorch/pytorch/issues/70480 the dynamic conv ops are currently missing a key feature to bring their performance in line with other dynamic ops, this diff disables conv/convT from being automatically quantized with convert dynamic Test Plan: buck test //caffe2/test:quantization --test-selectors test_quantized_module#TestDynamicQuantizedModule Reviewed By: vkuzo Differential Revision: D33511152 fbshipit-source-id: 50618fbe734c898664c390f896e70c68f1df3208	2022-01-13 11:28:02 -08:00
Wonjoo Lee	37eaf7640f	Revert "Revert D33480077: .github: Re-enable xla test config" (#71202 ) Summary: This reverts commit 14922a136f940e2f9bc9d04d7963b8141138efa0. Re-enable xla test config since PTXLA head is back to green -- https://app.circleci.com/pipelines/github/pytorch/xla. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71202 Reviewed By: wenleix Differential Revision: D33569109 Pulled By: seemethere fbshipit-source-id: ee0985768d1dfaa6c28865ae5b3dbce2a4a340f7	2022-01-13 11:19:18 -08:00
Jane Xu	40eb004da5	Use nightly-binary instead of nightly to deduplicate refs for nightlies (#71270 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71260 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71270 Reviewed By: seemethere Differential Revision: D33568858 Pulled By: janeyx99 fbshipit-source-id: 03de185af987e5cb3b021d842be20c4a353b1033	2022-01-13 10:10:35 -08:00
Digant Desai	003c94c790	[Quant] Templatize activationLimits function (#71220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71220 This is to allow using this function for uint8 as well as int8 Test Plan: buck test caffe2/test:quantization This primarily tests T=uint8 Reviewed By: kimishpatel Differential Revision: D33520713 fbshipit-source-id: 9640cf0a446e4c4e76887d643d72b767945bae76	2022-01-13 09:31:16 -08:00
Digant Desai	4a26624670	[Quant] Add a guard against shapes for qnnpack qadd (#71219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71219 qnnpack kernel does not support broadcasting Test Plan: buck test caffe2/test:quantization Reviewed By: kimishpatel Differential Revision: D33520613 fbshipit-source-id: 93c5226d53cb7b90ed495ff7b14158f7171d25bf	2022-01-13 09:31:12 -08:00
Digant Desai	e1b9d5854a	[Quant] Add quantized input tensor data type checks (#71218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71218 This asserts quint8 support, and fails with a helpful error message when attempted to use with a different qdtype Test Plan: buck test caffe2/test:quantization Reviewed By: kimishpatel Differential Revision: D33455785 fbshipit-source-id: 6ec728f59bb707c2d941b50e6375a698c66284c0	2022-01-13 09:29:55 -08:00
Jane Xu	188b744390	Make docker build cron once a week and not every hour on Wed (#71255 ) Summary: The many times a day was probably not intentional Pull Request resolved: https://github.com/pytorch/pytorch/pull/71255 Reviewed By: suo, atalman Differential Revision: D33559155 Pulled By: janeyx99 fbshipit-source-id: c8703cea6f3188c9bcb0867b895261808d3164ee	2022-01-13 08:26:57 -08:00
Kevin Tse	1e3893ecbb	[DataPipe] Removing deprecated DataPipes (#71161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71161 Users should import these DataPipes from [TorchData](https://github.com/pytorch/data) if they would like to use them. We will be checking for any downstream library usage before landing this PR. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33532272 Pulled By: NivekT fbshipit-source-id: 9dbfb21baf2d1183e0aa379049ad8304753e08a1	2022-01-13 07:37:48 -08:00
CodemodService FBSourceClangFormatLinterBot	60632a00fe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33561057 fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d	2022-01-13 03:27:06 -08:00
BowenBao	ff78c73286	[ONNX] Remove f arg from export_to_pretty_string (#69045 ) (#69546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69546 The arg is not used and was previously deprecated. Also remove torch.onnx._export_to_pretty_string. It's redundant with the public version. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32994270 Pulled By: msaroufim fbshipit-source-id: f8f3933b371a0d868d9247510bcd73c31a9d6fcc	2022-01-12 21:31:36 -08:00
Scott Wolchok	3cc34a4502	[PyTorch][Static Runtime] s/toObject/toObjectRef/ in native ops (#71238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71238 Saves a refcount bump for these. ghstack-source-id: 146927203 Test Plan: CI Reviewed By: mikeiovine Differential Revision: D33554385 fbshipit-source-id: b2f8d5afdc0eb80c8765d88560d0e547376f28d1	2022-01-12 18:44:40 -08:00
Mike Iovine	ffdc0e23af	[SR] Add various missing native ops (#71113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71113 This diff adds a variety of missing ~~out variants~~/native ops. Most of these are trivial, so I included them all in one diff. Native ops * `aten::mul` (list variant) * `aten::sub` (int variant) * `aten::add` (list variant) * `aten::Int` Out variants * ~~`aten::gt`~~ (codegen will handle) * ~~`aten::eq`~~ (codegen will handle) ghstack-source-id: 146927552 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D33510756 fbshipit-source-id: df385958b9561955b2e866dab2e4c050abd26766	2022-01-12 18:40:31 -08:00
Zhengxu Chen	f6b804ba9f	Fallback to server JIT type for type checking. Summary: T109800703 In runtime fallback to server JIT type if a DynamicType is parsed. Test Plan: local headset Reviewed By: scramsby Differential Revision: D33557763 fbshipit-source-id: f5fe7dabf668de2f55cc26f9ebe8bcbccd570ce3	2022-01-12 17:59:54 -08:00
Shirong Wu	84d4087874	Fix trt const_fold as output use case (#71194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71194 Reviewed By: jfix71, khabinov Differential Revision: D33541168 fbshipit-source-id: dd5787430b272977963323a6ce38b3e15e979278	2022-01-12 16:57:19 -08:00
Scott Wolchok	1bbea3c3a2	[PyTorch][JIT] Support mayContainAlias(Value, ArrayRef<Value>) (#69853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853 We can implement this overload more efficiently. ghstack-source-id: 146924693 Test Plan: patched alias_analysis tests Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s. Reviewed By: mikeiovine Differential Revision: D33039731 fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397	2022-01-12 16:53:54 -08:00
Scott Wolchok	cd253938a9	[PyTorch][SR][easy] s/input_or_constant_aliases/external_aliases/ (#69852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69852 Looks like a stale comment. ghstack-source-id: 146924694 Test Plan: review Reviewed By: hlu1 Differential Revision: D33033264 fbshipit-source-id: aa0eff463c42716bdd7142d4662d8668af439f68	2022-01-12 16:52:26 -08:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Michael Suo	7a93d8bb2d	Revert D32374542: Implement the patterns module for the multi subgraph rewriter. Test Plan: revert-hammer Differential Revision: D32374542 (`de62bcac66`) Original commit changeset: 4ae8da575976 Original Phabricator Diff: D32374542 (`de62bcac66`) fbshipit-source-id: 901e41d6abb202c5b1c6a3a84b060b2677b5bbe1	2022-01-12 15:50:58 -08:00
Raghavan Raman	9ca367d48b	[nnc] Use given kernel function name while emitting code (#67781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781 Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code. This was earlier committed as D31445799 (`c30dc52739`) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit. Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName' ``` Reviewed By: ZolotukhinM, bdhirsh Differential Revision: D32145958 fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e	2022-01-12 15:49:17 -08:00
Leo Fang	67941c8a94	Document `torch.cuda.ExternalStream`, `torch.cuda.caching_allocator_alloc` and `torch.cuda.caching_allocator_delete` (#70126 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67414. Fixes https://github.com/pytorch/pytorch/issues/70117. cc brianjo mruberry ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/70126 Reviewed By: mruberry Differential Revision: D33542910 Pulled By: ngimel fbshipit-source-id: 4b870f4dceca6ee4cc8fba58819f1cb18ac9f857	2022-01-12 15:44:40 -08:00
Peter Bell	ad803936d1	Codegen: ADInplaceOrViewType only include operators registered (#68692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68692 ADInplaceOrViewType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also ports the generated code over to the `at::_ops` interface, and the code generator itself to using `write_sharded` instead of re-implementing its own version of sharding. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33217916 Pulled By: albanD fbshipit-source-id: 90f1868f72644f1b5aa023cefd6a102bbbec95af	2022-01-12 15:34:45 -08:00
Jongsoo Park	cc55da8a9b	[caffe2/server quant] use new depthwise conv fbgemm interface (#71166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71166 Remove the use of deprecated old interface Test Plan: CI Reviewed By: jiyuanzFB Differential Revision: D33533494 fbshipit-source-id: 930eb93cd67c7a9bb77708cc48914aa0c9f1c841	2022-01-12 15:29:07 -08:00
Mark Kim-Mulgrew	de62bcac66	Implement the patterns module for the multi subgraph rewriter. (#71181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71181 This diff introduces the patterns module that defines a pattern-replacement pair for the experimental multi subgraph rewriter. Test Plan: Tested locally. Unit test suite forthcoming. Reviewed By: ajauhri Differential Revision: D32374542 fbshipit-source-id: 4ae8da575976e96b02c5c33c6ae2a0943fc7f126	2022-01-12 15:12:05 -08:00
Ivan Kobzarev	3c0c5bde0e	[cmake] Uncomment binaries (#71157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71157 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33528259 Pulled By: IvanKobzarev fbshipit-source-id: b8c216558ca612bedd4c37205f38ed29c2c82b3c	2022-01-12 15:01:44 -08:00
Eli Uriegas	e1f01d2c01	.ci: Add nightly trigger, remove CircleCI linux binary builds (#70957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70957 Adds nightly trigger for github actions using a workflow that will pull down viable/strict and tag it as `nightly` and then re-push it up to the repository. Also removes CircleCI linux binary builds since they will now be outmoded in favor of our new GHA workflow Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D33535609 Pulled By: seemethere fbshipit-source-id: ca6402df37db46e1872ff25befe96afa12e7b1af	2022-01-12 14:31:51 -08:00
Richard Barnes	6c1be299c1	caffe2/c10/core/TensorImpl.h: adapt to clang 12 (#70973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70973 clang12 builds fail like this: caffe2/c10/core/TensorImpl.h:2615:1: error: static_assert failed due to requirement 'sizeof(void ) != sizeof(long) \|\| sizeof(c10::TensorImpl) == sizeof(long) 24' "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed." Yet eliciting the size of that struct with this one-line addition: char (__show_sizeof)[sizeof( TensorImpl )] = 1; reports that its size is indeed 192 (aka 8 24): caffe2/c10/core/TensorImpl.h:2615:8: error: cannot initialize a variable of type 'char ()[192]' with an rvalue of type 'int' On closer inspection we determined that failures were occurring because TensorImpl was sometimes of size 208 and other times of size 192. The 192 size was expected and TensorImpl was hard-coded to raise an error for any other case on a 64-bit system, including the one we found where the size was 208. Additional investigation revealed that systems using GCC 11 and CUDA 11040 with either C++ 201402 and 201703 would sometimes yield TensorImpl sizes of 208 whereas systems newer systems without CUDA would always yield sizes of 192. The difference turned out to be that `std::unique_ptr` on NVCC systems is sometimes of size 16 and other times of size 8, accounting fully for the observed difference in TensorImpl sizes. We have not yet been able to find a set of preprocessor macros that predict when each size will occur. To handle the situation, we've added extensive debugging information to the TensorImpl size-checking logic. A number of preprocessing definitions capture compiler versions and other information to help understand what changes might have affected the size of TensorImpl. The size of each member of TensorImpl is now individually checked, along with the total size. Template-based comparison functions are used to provide compile-time outputs about the system state as well as the observed and expected sizes of each item considered. The template-based comparison functions cause the code to break if it's run on a 32-bit system because the templates and their associated static_asserts are compiled whether or not they'll ultimately be used. In C++17 we could prevent this using `if constexpr`; however, PyTorch is pinned to C++14, so we cannot. Instead, we check pointer size (`#if UINTPTR_MAX == 0xFFFFFFFF`) to determine which system we're on and provide separate checks for 32 vs 64-bit systems. A final wrinkle is that 32-bit systems have some variations in data size as well. We handle these by checking that the relevant items are `<=` the expected values. In summary... Improvements over the previous situation: Added checks for 32-bit systems * The sizes of individual fields are now checked * Compile-time size results (expected versus observed) are provided * Compile-time compiler and system info is provided * Landing this diff will actually enable checks of TensorImpl size; they are currently disabled to expedite LLVM-12 + newer CUDA upgrade efforts. Some work that could still be done: * Figure out what preprocessor flags (if any) predict the size of `std::unique_ptr` for 64-bit systems and of various elements of 32-bit systems. Test Plan: Building no longer triggers that static_assert failure. Reviewed By: luciang Differential Revision: D32749655 fbshipit-source-id: 481f84da6ff61b876a5aaba89b8589ec54d59fbe	2022-01-12 14:27:16 -08:00
mingfeima	385773cb77	add BFloat16 support for MaxPool2d on CPU (#56903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56903 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D28836791 Pulled By: VitalyFedyunin fbshipit-source-id: e03d55cc30dfa3628f096938fbad34b1031948af	2022-01-12 14:20:20 -08:00
James Reed	de902b5d02	[FX] Add a default_value arg to Graph.placeholder and fix split_module (#71016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71016 I found out that `split_module` doesn't preserve default values for arguments. In trying to fix that, I noticed that `Graph.placeholder` doesn't make it easy to add a default argument when making a placeholder. This PR addresses both of those issues Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D33482218 Pulled By: jamesr66a fbshipit-source-id: 57ebcdab25d267333fb1034994e08fc1bdb128ee	2022-01-12 14:03:17 -08:00
naturomics	5749be4678	Fix the shape inconsistency of `out` and `elem` tensor (#71065 ) Summary: See bug report https://github.com/pytorch/pytorch/issues/71063 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71065 Reviewed By: anjali411 Differential Revision: D33549921 Pulled By: ejguan fbshipit-source-id: bc43f5f9a88f7dcd8729d0e0f4b90d20f40b3064	2022-01-12 13:57:19 -08:00
Eli Uriegas	2290976880	ci: Comment out pull_request trigger for binary builds (#71244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71244 Binary builds adds a lot of skipped jobs to the default ciflow workflow so we're commenting out the pull_request trigger for now until the new ciflow mechanism becomes available Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D33555049 Pulled By: seemethere fbshipit-source-id: 2d0d4704e7297d5931b2c9705ee4dfb26760736e	2022-01-12 13:48:10 -08:00
Tristan Rice	bfe1abd3b5	torch/monitor: add pybind (#69567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567 This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation. * The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class. * This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector. * Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises. ``` events = [] def handler(event): events.append(event) handle = register_event_handler(handler) log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0})) ``` D32969391 is now included in this diff. This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data. Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32924141 fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8	2022-01-12 13:35:11 -08:00
Scott Wolchok	90ef54f8ea	[PyTorch] Remove buggy ExclusivelyOwnedTraits<intrusive_ptr<T>> (#70647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70647 It wasn't checking for the null state and it wasn't used. ghstack-source-id: 146819525 Test Plan: CI Reviewed By: hlu1 Differential Revision: D33414728 fbshipit-source-id: 7fcd648577cbfc35320c5c3ca9a19a14bd4d6858	2022-01-12 12:19:52 -08:00
Scott Wolchok	479ce1c3a0	[PyTorch] Add isUndefined to ExclusivelyOwnedTraits<TensorBase> debug msg (#70638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70638 We are seeing these assertions fire infrequently. Add more information to aid in debugging when they fire. ghstack-source-id: 146819527 Test Plan: CI Reviewed By: bdhirsh Differential Revision: D33412651 fbshipit-source-id: 7e35faf9f4eeaa5f2455a4392e00f62fe692811c	2022-01-12 12:18:33 -08:00
vfdev	4d28cef03a	Added AutocastCPU string (#70013 ) Summary: Description: - Added "AutocastCPU" string repr into `toString` method Before ``` std::cout << c10::DispatchKey::AutocastCPU; > UNKNOWN_TENSOR_TYPE_ID ``` and now: ``` std::cout << c10::DispatchKey::AutocastCPU; > AutocastCPU ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70013 Reviewed By: ejguan Differential Revision: D33550777 Pulled By: bdhirsh fbshipit-source-id: b31e15e6d52fc1768af085e428328117d588f283	2022-01-12 12:06:46 -08:00
Sahan Chanuka Paliskara	7884143dff	Legacy support for embedded interpreter (#71197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71197 Adds back legacy support for emmbedded interpreter to use .data section in internal use cases. Specifically this allows for dynamic loading of python extension files. Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy_gpu_legacy Reviewed By: shunting314 Differential Revision: D33542636 fbshipit-source-id: b49f94163c91619934bc35595304b9e84d0098fc	2022-01-12 11:48:27 -08:00
Jithun Nair	a71b4dc164	Update nightly wheels to ROCm4.5.2 (#71064 ) Summary: cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/71064 Reviewed By: malfet, janeyx99 Differential Revision: D33552643 Pulled By: seemethere fbshipit-source-id: 3754f69188864f6b3639818a4b9013ed255a2d7d	2022-01-12 11:41:55 -08:00
Jane Xu	fd0d4bef03	Edit cron to make the docker jobs run hopefully (#71232 ) Summary: Our docker builds have not been running with our previous cron, changes this so it should work hopefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71232 Reviewed By: ejguan Differential Revision: D33552231 Pulled By: janeyx99 fbshipit-source-id: 1a3e1607b03d37614eedf04093d73f1b96698840	2022-01-12 11:37:03 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	70951884d4	Add option to load historic operators in IR when the operator is deprecated (#71148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71148 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D33521300 Pulled By: tugsbayasgalan fbshipit-source-id: a0607dba5e7233590384326537017eb0b18da419	2022-01-12 11:07:04 -08:00
Nolan O'Brien	8f4cec2231	[warnings][Caffe2] Suppress warnings in caffe2 headers (#71196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71196 `caffe2` headers contain code that can elicit warnings when built with strict compiler flags. Rather than force downstream/consuming code to weaken their compiler flags, suppress those warnings in the header using `#pragma clang diagnostic` suppressions. Test Plan: CI Pass Reviewed By: malfet Differential Revision: D33536233 fbshipit-source-id: 74404e7a5edaf244f79f7a0addd991a84442a31f	2022-01-12 10:16:35 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	149f5ffa36	Fix inconsistency between new and old upgrader design (#71185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71185 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D33539191 Pulled By: tugsbayasgalan fbshipit-source-id: 721093793574663d56a8080c6a488024620266a1	2022-01-12 09:54:31 -08:00
Shiyan Deng	54fe2741a1	[fx2trt] break down div (#71172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71172 Break down div to smaller ops to make those div ops look like all other elementwise ops. Use operator div ops instead of torch div if possible to avoid converting literal numbers to torch tensor (like in the following). ``` a = 1 b = 2 // `c` would be 0.5 c = a / b // `c` would be torch.tensor([0.5]) c = torch.div(a, b) ``` The problem we saw on shufflenet is that there's size op followed by a div op which results in int64 tensors in acc traced graph (acc tracer turns operator.div to acc_ops.div which uses torch.div). And trt splitter splits out the reshape op that consumes the div op because we have a rule to split out ops that takes in int64 tensors as inputs. Test Plan: Unit tests. Reviewed By: wushirong Differential Revision: D33482231 fbshipit-source-id: 508a171520c4e5b4188cfc5c30c1370ba9db1c55	2022-01-12 09:46:46 -08:00
Kevin Tse	6a40bb0fdf	[DataPipe] Update deprecation warning (#71171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71171 Editing two warnings to more accurately portray the deprecation plan for the DataPipes cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33535785 Pulled By: NivekT fbshipit-source-id: b902aaa3637ade0886c86a57b58544ff7993fd91	2022-01-12 09:34:53 -08:00
Elias Ellison	706777bf56	Disable the output invocation in jit (#71138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71138 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33521059 Pulled By: eellison fbshipit-source-id: eaf20eaa6e62159dff9369a7b75e6d6009fb45d0	2022-01-12 09:11:37 -08:00
Elias Ellison	5480deb183	Add support for permutting dynamic fusion group outputs to channels last format (#70656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458650 Pulled By: eellison fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41	2022-01-12 09:11:34 -08:00
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
Elias Ellison	975e7d246e	Remove ignore shapes arg (#71144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144 This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now. Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D33521061 Pulled By: eellison fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307	2022-01-12 09:09:49 -08:00
lezcano	97585ae1e7	Simplify forward / backward AD for linalg.eigh and add checks (#70528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70528 This PR adds checks for the backward of `linalg.eigh`, similar to those deduced in https://github.com/pytorch/pytorch/pull/70253 It also makes its the implementation parallel that of the (fwd/bwd) derivative of `torch.linalg.eig` and it makes most OpInfo tests pass. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33530149 Pulled By: albanD fbshipit-source-id: 1f368b8d450d4e9e8ae74d3881c78513c27eb956	2022-01-12 08:35:52 -08:00
lezcano	061be8d600	Correct forward AD for linalg.eig and add checks (#70527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70527 This PR adds checks for the backward of `linalg.eig`, similar to those deduced in https://github.com/pytorch/pytorch/pull/70253 It also modifies the function so that it does not save the input matrix, as it's not necessary. It also corrects the forward AD formula for it to be correct. Now all the tests pass for `linalg.eig` and `linalg.eigvals`. It also updates the docs to reflect better what's going on here. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33530148 Pulled By: albanD fbshipit-source-id: 984521a04f81ecb28ac1c4402b0243c63dd6959d	2022-01-12 08:30:55 -08:00
Jane Xu	e1aea9b968	Add retry to disabled tests file download (#71030 ) Summary: Helps with spotty disabling brought up in https://github.com/pytorch/pytorch/issues/70877 and https://github.com/pytorch/pytorch/issues/70875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71030 Reviewed By: malfet, atalman Differential Revision: D33486379 Pulled By: janeyx99 fbshipit-source-id: 56c4d56c2bd8be47a51dee19373aac6c9c5d1691	2022-01-12 08:20:44 -08:00
Philip Meier	928ca95ff0	fix TensorLikePair origination (#70304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70304 Without this patch `TensorLikePair` will try to instantiate everything although it should only do so for tensor-likes. This is problematic if it is used before a different pair that would be able to handle the inputs but never gets to do so, because `TensorLikePair` bails out before. ```python from torch.testing._comparison import assert_equal, TensorLikePair, ObjectPair assert_equal("a", "a", pair_types=(TensorLikePair, ObjectPair)) ``` ``` ValueError: Constructing a tensor from <class 'str'> failed with new(): invalid data type 'str'. ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542995 Pulled By: mruberry fbshipit-source-id: 77a5cc0abad44356c3ec64c7ec46e84d166ab2dd	2022-01-12 06:44:00 -08:00
Philip Meier	49a5b33a74	add a equality comparison helper for assert_close internals (#69750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69750 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542993 Pulled By: mruberry fbshipit-source-id: 0de0559c33ec0f1dad205113cb363a652140b62d	2022-01-12 06:43:57 -08:00
Philip Meier	b0a10a709f	add explanation of quantized comparison strategy in assert_close (#68911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68911 Closes #68548. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542997 Pulled By: mruberry fbshipit-source-id: 78accf20a83cd72254ae0036dc23f9e5376a4c65	2022-01-12 06:43:53 -08:00
Philip Meier	802dd2b725	change sparse COO comparison strategy in assert_close (#68728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68728 This removes the ability for `assert_close` to `.coalesce()` the tensors internally. Additionally, we now also check `.sparse_dim()`. Sparse team: please make sure that is the behavior you want for all sparse COO comparisons in the future. #67796 will temporarily keep BC by always coalescing, but in the future `TestCase.assertEqual` will no longer do that. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542996 Pulled By: mruberry fbshipit-source-id: a8d2322c6ee1ca424e3efb14ab21787328cf28fc	2022-01-12 06:43:50 -08:00
Philip Meier	8d05174def	make meta tensor data access error message for expressive in assert_close (#68802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68802 Without this patch, the error message of comparing meta tensors looks like this after #68722 was merged: ```python >>> t = torch.empty((), device="meta") >>> assert_close(t, t) NotImplementedError: Could not run 'aten::abs.out' with arguments from the 'Meta' backend. [...] [...] The above exception was the direct cause of the following exception: [...] RuntimeError: Comparing TensorLikePair( id=(), actual=tensor(..., device='meta', size=()), expected=tensor(..., device='meta', size=()), rtol=1.3e-06, atol=1e-05, equal_nan=False, check_device=True, check_dtype=True, check_layout=True, check_stride=False, check_is_coalesced=True, ) resulted in the unexpected exception above. If you are a user and see this message during normal operation please file an issue at https://github.com/pytorch/pytorch/issues. If you are a developer and working on the comparison functions, please except the previous error and raise an expressive `ErrorMeta` instead. ``` Thus, we follow our own advice and turn it into an expected exception until #68592 is resolved: ```python >>> t = torch.empty((), device="meta") >>> assert_close(t, t) ValueError: Comparing meta tensors is currently not supported ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542999 Pulled By: mruberry fbshipit-source-id: 0fe1ddee15b5decdbd4c5dd84f03804ca7eac95b	2022-01-12 06:43:47 -08:00
Philip Meier	b652887ad7	improve documentation of comparison internals (#68977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68977 Follow-up to #68722 to address the review comments that were left open before merge. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33542998 Pulled By: mruberry fbshipit-source-id: 23c567cd328f83ae4df561ac8ee6c40c259408c9	2022-01-12 06:42:30 -08:00
Joel Schlosser	523d448968	Remove deprecated cuDNN convolution ops (#71128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71128 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33517677 Pulled By: jbschlosser fbshipit-source-id: 1690fd38a38ee7cf16865209280a9c457c5f70ff	2022-01-12 06:34:42 -08:00
CodemodService FBSourceClangFormatLinterBot	93b2399c6c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33544281 fbshipit-source-id: 4f0b5d6d490e6fcb967550cfb1dc0111b1770f73	2022-01-12 04:16:43 -08:00
Elias Ellison	4a8d4cde65	Fix for tensor in list return added to wildcard set (#71170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71170 As with an output in a tuple return, an output in a list return will not have any further uses that would make adding it directly to the list's contained elements give incorrect behavior. This unblocks a use case in op authoring. cc Chillee Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33535608 Pulled By: eellison fbshipit-source-id: 2066d28e98c2f5d1b3d7e0206c7e39a27b3884b1	2022-01-11 22:12:39 -08:00
Elias Ellison	9bccb31306	Remove precise tuple construct flag (#71121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33515234 Pulled By: eellison fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8	2022-01-11 22:12:36 -08:00
Elias Ellison	47ad6628f1	add optional refining (#69776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69776 If we have an node output which is an optional type, but both if blocks produce a non-optional value, we can try to refine the if output type, which can open up further optimization opportunities. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33515235 Pulled By: eellison fbshipit-source-id: 34f6ab94ac4238498f9db36a1b673c5d165e832e	2022-01-11 22:12:34 -08:00
Elias Ellison	772b3e92bf	Parse symbolic shapes (#69775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69775 Adds parsing for Symbolic Shapes. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33515233 Pulled By: eellison fbshipit-source-id: 7ebb22c0ab37d78e459ebcab67bb86f731d00376	2022-01-11 22:12:31 -08:00
Elias Ellison	97e8dcba5e	Fix mis-specified device arg name (#69645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69645 As noted in code comment: existing device operator is registered with input name `a`, which prevents torch.device(type="cuda") from working. add shim-layer here Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33515231 Pulled By: eellison fbshipit-source-id: c04af8158a9568a20cd5fbbbd573f6efab98fd60	2022-01-11 22:11:24 -08:00
Zhengxu Chen	9465c24245	[jit][edge] Use dynamic type instead of union types for schema parsers. (#70509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509 TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase. ghstack-source-id: 146818621 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33306737 fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634	2022-01-11 20:14:25 -08:00
Ivan Yashchuk	40121456af	Sparse CSR: Add `torch.randn_like` (#68083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68083 This PR adds support for `torch.randn_like(sparse_csr_tensor)`. It creates a new sparse csr tensor with same indices but different values that are normally distributed. In addition `.normal_()` and `torch.empty_like` were implemented because `randn_like` is a composite of these two functions. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33511280 Pulled By: cpuhrsch fbshipit-source-id: 6129083e8bc6cc5af2e0191294bd5e4e864f6c0e	2022-01-11 18:29:24 -08:00
Vasiliy Kuznetsov	831c129e85	fx quant: fix test_fx_acc_tracer::test_quantized_batch_norm2d (#71175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71175 D33330022 was landed with a Meta test failure (ghstack clobbered the fix), resubmitting the Meta-only part to fix CI. Test Plan: ``` buck test mode/opt //caffe2/test:test_fx_acc_tracer -- --exact 'caffe2/test:test_fx_acc_tracer - test_quantized_batch_norm2d (fx_acc.test_acc_tracer.AccTracerTest)' --run-disabled ``` Reviewed By: HDCharles Differential Revision: D33531994 fbshipit-source-id: 39dc945c54fb9a7205c9d4114ede6b5ab99c5012	2022-01-11 17:38:00 -08:00
Ivan Yashchuk	410e91adee	Performance and memory improvements to batched torch.linalg.solve (#69752 ) Summary: Previously for single input matrix A and batched matrix B, matrix A was expanded and cloned before computing the LU decomposition and solving the linear system. With this PR the LU decomposition is computed once for a single matrix and then expanded&cloned if required by a backend library call for the linear system solving. Here's a basic comparison: ```python # BEFORE THE PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 329 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # WITH THIS PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 21.4 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69752 Reviewed By: albanD Differential Revision: D33028236 Pulled By: mruberry fbshipit-source-id: 7a0dd443cd0ece81777c68b29438750f6524ac24	2022-01-11 16:14:16 -08:00
Taylor Robie	786f946098	[Profiler] Add glue layer to reduce the use of `#ifdef USE_KINETO` in the profiler code. (#69798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69798 One of the major sources of complexity in `profiler_kineto.cpp` is that kineto may or may not be available. The code (including the types) follows two related but often distict codepaths, and large sections may or may not be `#ifdef`'d out. Optimizing such code which preserving correctness is quite difficult; at one point I realized that I had broken the non-Kineto case, because moving work into the finalize step runs astray of a very large `#ifdef` around the finalize logic. In order to make optimization more tractable, I gathered all of the calls to Kineto APIs and isolated them in the `kineto_shim.h/.cpp` files: the header allows callers to pretend as though Kineto is always available (mostly), and the cpp file hides most of the horrible `#ifdef`s so they don't pollute the main profiler code. Test Plan: Unit tests. Reviewed By: aaronenyeshi Differential Revision: D32690568 fbshipit-source-id: 9a276654ef0ff9d40817c2f88f95071683f150c5	2022-01-11 15:57:46 -08:00
Victor Quach	a3b7dd7b78	Enable nested default hooks (#70932 ) Summary: When default hooks are set, they are pushed onto a stack. When nesting context-manager, only the inner-most hooks will be applied. There is special care needed to update the TLS code. See also https://github.com/pytorch/pytorch/issues/70940 (i.e. do we need to be storing the enabled flag as well?) Fixes https://github.com/pytorch/pytorch/issues/70134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70932 Reviewed By: mruberry Differential Revision: D33530370 Pulled By: albanD fbshipit-source-id: 3197d585d77563f36c175d3949115a0776b309f4	2022-01-11 15:03:49 -08:00
Michael Suo	433cf44b79	delete ecr_gc_docker job (#71178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71178 This should no longer be needed as we now set a lifecycle policy on ECR and we also don't generate lots of temporary containers anymore. Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D33537851 Pulled By: suo fbshipit-source-id: b97b7525be6f62ec8771dfb6a7ee13b22b78ac5a	2022-01-11 14:53:31 -08:00
Zhengxu Chen	e7634f83ce	[jit][edge] Migrate base types to DynamicType on mobile. (#70233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233 Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type(). ghstack-source-id: 146818622 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137219 fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5	2022-01-11 13:53:29 -08:00
vfdev	ecb6defa36	Fixed docs for forward_ad.make_dual (#71159 ) Summary: Minor docs change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71159 Reviewed By: mruberry Differential Revision: D33530031 Pulled By: albanD fbshipit-source-id: e0bbe3a29a7de675fa4c9bf90976616f0e093f74	2022-01-11 13:47:09 -08:00
Timon Künzle	2c8cb8a964	Speed up quantized upsampling for channels last (#70903 ) Summary: Moving the calls to `q_zero_point()` outside the for loop considerably speeds up upsampling for channels last format. This fix is very similar to https://github.com/pytorch/pytorch/pull/66525 but applies it for channels last format. Fixes https://github.com/pytorch/pytorch/issues/70902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70903 Reviewed By: mruberry Differential Revision: D33531805 Pulled By: vkuzo fbshipit-source-id: e723f1e3d53bdd66529c1326dccba889402a126c	2022-01-11 13:28:10 -08:00
Andrey Talman	edf15ebbc2	Adding python 3.10 binary workflows (#71132 ) Summary: Testing python 3.10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71132 Reviewed By: mruberry Differential Revision: D33534609 Pulled By: atalman fbshipit-source-id: 561412735fb6d1269fca3db0fac5afd437a0bde2	2022-01-11 13:18:18 -08:00
Taylor Robie	7d6535cab3	Make Kineto + distributed a warning rather than an error (#71120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71120 D33283314 (`681e78bace`) is causing jobs to fail when profiled, which is not ideal. Test Plan: pyper-online-cli launch 306587531 AI_INFRA ads_global_pyper_sla oncall_model_store --training_package_version training_platform:9344fe410969bdf614bc89cff0280281 --training_stage ONLINE --training_environment DEV --timeout 1728000 (Courtesy of yanjzhou) Reviewed By: xw285cornell Differential Revision: D33437773 fbshipit-source-id: 5c492f83146ff82557cfc1142aade3432cf73ca5	2022-01-11 12:50:17 -08:00
Richard Barnes	45b0bafb38	Drop more unused variables (#71123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71123 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33511656 fbshipit-source-id: b53565b589720cce9fdfe3bc222853dba8645aff	2022-01-11 12:46:24 -08:00
Richard Barnes	6c03f8d9e5	Drop unused variables and add some const (#71106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71106 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33490855 fbshipit-source-id: 9fc4a4e4a7ad5e6c31f394ec6d8221b964fdf043	2022-01-11 12:38:59 -08:00
Ivan Yashchuk	1c8b167327	Move implementation of empty_like for sparse COO (#71103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71103 Previously the implementation of empty_like for sparse COO was a conditional path in generic implementation. This PR makes use of the Dispatcher and moves the implementation into a separate function. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33511240 Pulled By: cpuhrsch fbshipit-source-id: 9a84f82a27e3cf0ac819d867b86df6d10ddf7fa7	2022-01-11 12:30:39 -08:00
Ilya Persky	a8612cd72a	Skip failing tests in test_nn if compiled without LAPACK (#70913 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70912 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70913 Reviewed By: mruberry Differential Revision: D33534840 Pulled By: albanD fbshipit-source-id: 0facf5682140ecd7a78edb34b9cd997f9319e084	2022-01-11 12:21:18 -08:00
Michael Suo	14922a136f	Revert D33480077: .github: Re-enable xla test config Test Plan: revert-hammer Differential Revision: D33480077 (`18e1e1d4d3`) Original commit changeset: a2e720c55d0e Original Phabricator Diff: D33480077 (`18e1e1d4d3`) fbshipit-source-id: e4e114a9a6d7940491ac0741e94f455a490f077a	2022-01-11 12:12:15 -08:00
Nikita Shulga	940b89b03f	Disable Python-3.6 binary builds (#71163 ) Summary: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/71163 Reviewed By: anjali411 Differential Revision: D33532813 Pulled By: malfet fbshipit-source-id: ab0833c2db187c452681a17907583599ff1cb481	2022-01-11 11:25:45 -08:00
Zhengxu Chen	4f35b9144c	[jit][edge] Migrate ListType to DynamicType on mobile. (#70212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212 Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146818619 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33176931 fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6	2022-01-11 10:57:53 -08:00
Eli Uriegas	18e1e1d4d3	.github: Re-enable xla test config (#71008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71008 This reverts commit 6f83841582d8d818129dc4ce82a8478f221b32d7. Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D33480077 Pulled By: seemethere fbshipit-source-id: a2e720c55d0e1995e2b6cf2da7c801f377d52b3f	2022-01-11 10:49:20 -08:00
Eli Uriegas	85c6489cdc	ci: unquote env variables (#71139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71139 These variables were being interpreted as being quoted in the GITHUB_ENV file meaning they didn't register correctly when attempting to do the actual binary_upload.sh leading to binaries not actually getting uploaded. This remedies that Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, b0noI Differential Revision: D33519952 Pulled By: seemethere fbshipit-source-id: 727f6d4e5dbdfd0a3e2c76058bee9430b2c717a9	2022-01-11 10:21:11 -08:00
Richard Barnes	cf61738097	Drop unused variables; make things const; use some auto (#71107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71107 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33490773 fbshipit-source-id: 0d259db9c58c9b33aecc560075f6dcfa78883467	2022-01-11 08:55:54 -08:00
Alban Desmaison	3c2ae2b47c	Revert D32994274: [ONNX] Link to the wiki (#68505 ) Test Plan: revert-hammer Differential Revision: D32994274 (`a606ea73d6`) Original commit changeset: 34d54f935799 Original Phabricator Diff: D32994274 (`a606ea73d6`) fbshipit-source-id: 81fc96c2aff9d14efb5e092fffd0685e507837e6	2022-01-11 07:40:14 -08:00
Khushi Agrawal	1b496cf158	Fixes doc errors in `Tensor.triu()`, `Tensor.tril()`, `Tensor.ravel()`. (#71057 ) Summary: Hi, PyTorch Team! I am very much interested in starting up my contribution to PyTorch. I made several contributions in NumPy and CuPy, but this is my first PR towards PyTorch. I aim to contribute more in the upcoming future. The PR fixes https://github.com/pytorch/pytorch/issues/70972 https://github.com/pytorch/pytorch/issues/70975. #### Aim of PR The functions like `Tensor.ravel`, `Tensor.tril`, `Tensor.tril_`, `Tensor.triu`, and `Tensor.triu_` had a couple of typos in docs. The PR aims to resolve that. I'm looking forward to your viewpoints. Thanks! cc: kshitij12345 vadimkantorov Lezcano TestSomething22 cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/71057 Reviewed By: preeti1205 Differential Revision: D33502911 Pulled By: mruberry fbshipit-source-id: 8ce0b68a29658a5a0be79bc807dfa7d71653532d	2022-01-11 07:34:59 -08:00
Erjia Guan	ac0d131291	Decprecating routed decoder (#70990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70990 Releasing `decode` API for domains to let them implement custom `decode` DataPipe for now. Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33477620 Pulled By: ejguan fbshipit-source-id: d3c30ba55c327f4849d56f42d328a932a31777ed	2022-01-11 06:56:48 -08:00
Andrey Talman	d6b7d69d8b	Python3.10 migration adding to binary linux tests (#71130 ) Summary: Python3.10 migration adding to binary linux tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/71130 Reviewed By: seemethere, janeyx99 Differential Revision: D33518787 Pulled By: atalman fbshipit-source-id: 53c2c1b96e7a530a2af9ae7d5840bf8398b870e5	2022-01-11 05:54:07 -08:00
CodemodService FBSourceClangFormatLinterBot	fb8a9732d9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33524330 fbshipit-source-id: 112291a23e2efe2d573bee86ead8ce2fc3957e5b	2022-01-11 04:33:21 -08:00
CodemodService Bot	fdda7b5e8a	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D33525225 fbshipit-source-id: 973eb9f9a5dfbd70bf0127f44089237969c2bb68	2022-01-11 04:20:46 -08:00
Zhengxu Chen	40b80aa490	[jit][edge] Migrate TupleType to DynamicType on mobile. (#70205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70205 Use DynamicType instead of TupleType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146818620 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33176925 fbshipit-source-id: 00f7a5db37ba772c912643c733db6c52dfdc695d	2022-01-11 01:01:48 -08:00
Shintaro Iwasaki	5cae40c169	[pytorch][aten][cuda] move CUDAGeneratorImpl.h to ATen/cuda (#70650 ) Summary: This patch moves a CUDA-specific file, `CUDAGeneratorImpl.h` to `ATen/cuda` as the following TODO comment in `CUDAGeneratorImpl.h` suggests: ``` // TODO: this file should be in ATen/cuda, not top level ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70650 Reviewed By: jianyuh, xw285cornell Differential Revision: D33414890 Pulled By: shintaro-iwasaki fbshipit-source-id: 4ff839205f4e4ea4c8767f164d583eb7072f1b8b	2022-01-10 22:27:04 -08:00
Terry Chen	33a5905cc6	[quant] fix reduce_range warning (#71027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71027 Fix issue #61054. remove warning reduce_range=True which caused the error message "UserWarning: Please use quant_min and quant_max to specify the range for observers". Test Plan: python test/test_quantization.py TestFakeQuantizeOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D33484341 fbshipit-source-id: 97c3d4658926183f88a0c4665451dd7f913d30e6	2022-01-10 20:05:36 -08:00
Andrew Or	59e166feb2	[Quant][DBR] Add test for serialization (#70078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70078 This commit adds a serialization test for DBR. Test Plan: python test/test_quantization.py TestQuantizeDBR.test_serialization Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D33405192 fbshipit-source-id: 39c4cca49aff8b960f4dec6c272fbd0da267fa95	2022-01-10 17:50:05 -08:00
anjali411	043e84b3d2	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. Follow up work: 1. disallow `default` as an overload name for aten operators. 2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT) 3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc) cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33469839 Pulled By: anjali411 fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621	2022-01-10 17:29:06 -08:00
Zhengxu Chen	b12ca69179	[jit][edge] Migrate DictType to DynamicType on mobile. (#70202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202 Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places: 1. Type parser which produces the Type constants. 2. IValue::type() which returns reflected Type from IValues. 3. Helper functions to construct the container value. 4. Typechecks which test whether a type instance is a particular container type. ghstack-source-id: 146735648 Test Plan: no behavior change. Reviewed By: iseeyuan Differential Revision: D33137257 fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d	2022-01-10 15:55:29 -08:00
BowenBao	a606ea73d6	[ONNX] Link to the wiki (#68505 ) (#69544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69544 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32994274 Pulled By: msaroufim fbshipit-source-id: 34d54f935799fa94516a541a241900ec205c7427 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2022-01-10 15:51:04 -08:00
soulitzer	7397683b57	Add forward AD formulas for mv, scatter_add, _s_where (#70468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70468 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33405364 Pulled By: soulitzer fbshipit-source-id: 7681c33fb264a7a3ec6436ebb7c5bb07cd5ffc3d	2022-01-10 13:54:10 -08:00
soulitzer	78994d13c0	Add forward AD formulas for {batch,layer,group}_norm (#70355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70355 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33405362 Pulled By: soulitzer fbshipit-source-id: 55a92e88a04e7b15a0a223025d66c14f7db2a190	2022-01-10 13:52:16 -08:00
Shirong Wu	7a08030903	Fix fx2trt CI test trigger condition (#71014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71014 Replace test trigger with test_config matching. Test Plan: CI https://github.com/pytorch/pytorch/runs/4746717568?check_suite_focus=true Reviewed By: janeyx99 Differential Revision: D33480971 fbshipit-source-id: 9513e464753343a7ae47fcfaf48119f34bae94c5	2022-01-10 13:37:24 -08:00
John Clow	80659b71a5	Hoisting common expressions out of If blocks [retry] (#65645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65645 This is a retry of PR: https://github.com/pytorch/pytorch/pull/59492 Latest Changes: Added more tests, added the getOrCreateDB pattern, updated logic to remove unnecessary checks addressed all comments. Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Test Plan: python test_jit.py TestIfHoisting Reviewed By: eellison Differential Revision: D33302065 Pulled By: Gamrix fbshipit-source-id: a5a184a480cf07354359aaca344c6e27b687a3c2	2022-01-10 13:28:17 -08:00
Omar Younis	569aeec1bc	fix typo in debugging_hooks.py (#70956 ) Summary: I just fixed a small typo in the debugging_hooks documentation cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70956 Reviewed By: jbschlosser Differential Revision: D33508898 Pulled By: dagitses fbshipit-source-id: fc5935e5a2e2ddc45657a22d3b33a11aba378d9b	2022-01-10 12:59:42 -08:00
Shirong Wu	49ed097ebe	Add documentation for lowering (#71116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71116 As title, add more inline documentation for code. Test Plan: no pingpoke Reviewed By: 842974287 Differential Revision: D33465611 fbshipit-source-id: 6b5529893098e5591470c2f41a0d8989e3cfccb9	2022-01-10 12:56:59 -08:00
Eli Uriegas	3fbff80bea	ci: Move MAX_JOBS to not set on Darwin (#71122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71122 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33515392 Pulled By: seemethere fbshipit-source-id: 376608c9a6e2e685a07d5010ce443a3f02475ee5	2022-01-10 12:49:14 -08:00
Steven Morad	cfc1117591	Update sparse.rst to warn about _values() (#71088 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71088 Reviewed By: jbschlosser Differential Revision: D33511207 Pulled By: cpuhrsch fbshipit-source-id: 9d0c5445842ed96999eb88445cbea7ae284b1a6f	2022-01-10 12:43:46 -08:00
Zhengxu Chen	30699cbfd5	Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048 reland D33284352 (`0a921ba0d0`) ghstack-source-id: 146735646 Test Plan: All Github CI: ciflow rerun -l ciflow/all Reviewed By: gmagogsfm Differential Revision: D33489731 fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e	2022-01-10 12:42:23 -08:00
Elias Ellison	fb66f561b1	Add copy out to the fallback path in SR invocation of composed op (#70871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871 We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458652 Pulled By: eellison fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08	2022-01-10 12:16:38 -08:00
Elias Ellison	c8332256ee	[JIT] Refactor SR invocation of fusion (#70508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70508 We can create the code object at compile time instead or runtime to speed it up. This also makes unnecessary the compilation cache. TODO: figure out if theres a way to cache InterpreterState object Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458648 Pulled By: eellison fbshipit-source-id: 710389741e7c6210528f2f96ab496fcd533d942a	2022-01-10 12:16:35 -08:00
Elias Ellison	0adc7cc546	Inline Fallback Functions For Debugging (#70463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70463 Fix for https://github.com/pytorch/pytorch/issues/52940 When we call inlining on a fallback function, insert the runtime optimized version of its graph. Test Plan: Imported from OSS Reviewed By: jbschlosser, davidberard98 Differential Revision: D33458651 Pulled By: eellison fbshipit-source-id: fd7e5e2b5273a1677014ba1a766538c3ee9cad76	2022-01-10 12:15:11 -08:00
BowenBao	840459a269	[ONNX] Relax constant_fold gather with indices rank > 1 (#68140 ) (#68493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68493 Fixes #66786. `index_select` only supports `index` of 1-D tensor. `ONNX::Gather` allows `index` to have rank `q`. Abort constant folding `ONNX::Gather` if `index` rank is larger than 1. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D32483826 Pulled By: msaroufim fbshipit-source-id: a8e8389d85287a859d32abf8d8d98852290b0a03 Co-authored-by: BowenBao <bowbao@microsoft.com>	2022-01-10 11:55:02 -08:00
BowenBao	4b47047dae	[ONNX] Add support for shrink ops (#66969 ) (#68492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68492 * Initial commit * Fix flake issue * Add test tags Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D32483827 Pulled By: msaroufim fbshipit-source-id: 41c623712524465b877d0fe0e2f4001d475bf2ce	2022-01-10 11:38:31 -08:00
Richard Barnes	62441157e3	Have getFilesToLevels return a reference (#71047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71047 The copy induced by getFilesToLevels is currently consuming 3,457,470,000 cycles per day. A reference might fix that. Reference: ``` ["Inline torch::jit::JitLoggingConfig::getFilesToLevels[abi:cxx11] @ caffe2/torch/csrc/jit/jit_log.cpp:54"] ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33479180 fbshipit-source-id: 05d306ad9ea23e2f30348a08d547ebe274eb0c10	2022-01-10 11:32:32 -08:00
Eli Uriegas	87484d67e3	.github: Enable linux binary builds (#68388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68388 Updates the gpu architectures as well as adding a trigger for on_pull_request for the binary build workflows so that we can iterate on this later TODO: * Create follow up PR to enable nightly linux GHA builds / disable CircleCI nighlty linux builds Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D33462294 Pulled By: seemethere fbshipit-source-id: 5fa30517550d36f504b491cf6c1e5c9da56d8191	2022-01-10 11:30:45 -08:00
John Clow	e9a8bb59b4	Move the apply_tensor_props into its own function for more public use (#67786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67786 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32175962 Pulled By: Gamrix fbshipit-source-id: caefe1c849277632d976a6b5513f72b47595f2c0	2022-01-10 11:26:03 -08:00
Sahan Paliskara	3ef10da97d	add support for pickle v4 (#70642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70642 Review history on https://github.com/pytorch/pytorch/pull/70014 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D33414364 Pulled By: PaliC fbshipit-source-id: 7e7ed491c6f16d4fac3a03f7e403935823c03aa6	2022-01-10 11:13:41 -08:00
Sahan Paliskara	118bd82dde	detect mocked module on saving pass (#70641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70641 Raises a not implemented error if we attempt to pickle an object which uses a mocked module. Now we no longer have to load the object to get this check, and instead happens right on the saving path. Review History is on https://github.com/pytorch/pytorch/pull/69793 PR was moved to a different branch due to original branch getting corrupted. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D33414365 Pulled By: PaliC fbshipit-source-id: 6d72ddb05c47a3d060e9622ec0b6e5cd6c6c71c8	2022-01-10 11:11:55 -08:00
Jane Xu	c4400fc431	Retire repeat_test_for_types (#71033 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69865 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71033 Reviewed By: mruberry Differential Revision: D33486370 Pulled By: janeyx99 fbshipit-source-id: 71f9383dbc1e00b572f26eb4f04d0a94c6759e35	2022-01-10 09:13:54 -08:00
Alban Desmaison	e1b84e1b6b	fix loading of older models that don't have maximize (#71023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71023 Reviewed By: jbschlosser Differential Revision: D33483687 Pulled By: albanD fbshipit-source-id: 2f3c6e97a9579be9ba15eca0756fc1f2c466fbb6	2022-01-10 06:01:24 -08:00
Lucian Grijincu	b27dfa70c4	caffe2: disable TensorImpl static_assert (temporary) Test Plan: buck2 build -c cxx.modules=false -c fbcode.platform=platform010 fbcode//caffe2/aten:ATen-cu Reviewed By: singhsrb, meyering Differential Revision: D33501636 fbshipit-source-id: a1a5bbb2b160eba8eb5abba4f6ae1929a58e11e9	2022-01-09 23:11:17 -08:00
Bradley Davis	fca8a0acaa	Prevent import race condition that leaves torch.package.PackagePickler with unwanted dispatch table entries. (#71025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71025 TL;DR In come cases: 1) user imports `dill`, which mutates `_Pickler.dispatch`, 2) user imports lib that imports `torch.package` 3) `PackagePickler.dispatch = _Pickler.dispatch.copy()` makes a copy of the mutated table 4) user calls `dill.extend(use_dill=False)` to reset `_Pickler.dispatch`, expecting everything to be okay 5) `PackagePickler` is used to pickle something like `ModuleDict`. `PackagePickler.dispatch` has stale entries to dill pickle functions like `save_module_dict`, which sometimes hard-code calls to `StockPickler.save_global`, which is unaware of torch.package module prefixes. 6) Exception is raised, e.g. `Got unhandled exception Can't pickle <class '<torch_package_2>.caffe2.mylib'>: it's not found as <class '<torch_package_2>.caffe2.mylib'>` Differential Revision: D33483672 fbshipit-source-id: d7cd2a925bedf27c02524a6a4c3132a262f5c984	2022-01-09 15:13:39 -08:00
Rohan Varma	2bed616e0f	[Dist tests] Make event_listener work for all dist tests (#70628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70628 event_listener thread is used to log process tracebacks when a timed out process sends it a request to get its traceback. Although, this thread is created in `_run` function which is overridden by some classes such as `TestDistBackend` so those tests did not have this feature. Move the event_listener setup logic to `run_test` which is called by all distributed test classes, which enables it for all distributed tests. Also modify logger setup to ensure that logging.info calls are printed in the subprocess. ghstack-source-id: 146714642 Test Plan: CI Reviewed By: jaceyca, fduwjj Differential Revision: D33410613 fbshipit-source-id: aa616d69d251bc9d04e45781c501d2244f011843	2022-01-09 14:54:09 -08:00
Rui Zhu	9267fd8d73	[WIP] [ATen] Add native_multi_attention_self_attention CPU + GPU implementation (#70649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70649 As described in https://fb.quip.com/oxpiA1uDBjgP This implements the first parts of the RFC, and is a rough draft showing the approach. The idea is that for the first cut we can maintain very close (identical I believe in this diff) numerical equivalence to the existing nn.MHA implementation, which is what this diff attempts to do. In subsequent implementations, once we have a working and adopted native self-attention implementation, we could then explore alternative implementations, etc. The current implementation is similar to existing dedicated implementations such as LightSeq/FasterTransformer/DeepSpeed, and for MHA on both CPUs and GPUs is between 1.2x and 2x faster depending on the setting. It makes some approximations/restrictions (doesn't handle masking in masked softmax, etc), but these shouldn't materially impact performance. This does the first few items: * add native_multi_head_attention(...) , native_multi_head_attention_backward(..) to native_functions.yaml * Implement native_multi_head_attention(..) on GPU, extracting bits and pieces out of LS/DS/FT as appropriate * Implement native_multi_head_attention(..) on CPU The backward implementation is still WIP, but the idea would be to: * Hook these up in derivatives.yaml Implement native_multi_head_attention_backward(..) on GPU, extracting out bits and pieces out of LS/DS (not FT since it’s inference only) * Implement native_multi_head_attention_backward(..) on CPU * In torch.nn.functional.multi_head_attention_forward `23321ba7a3/torch/nn/functional.py (L4953)`, add some conditionals to check if we are being called in a BERT/ViT-style encoder fashion, and invoke the native function directly. Test Plan: TODO Reviewed By: mikekgfb Differential Revision: D31829981 fbshipit-source-id: c430344d91ba7a5fbee3138e50b3e62efbb33d96	2022-01-08 21:50:41 -08:00
Stephen Macke	785b6905de	reduce plan generation log spam (#70880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70880 Change loglevel to `debug` in caffe2 `optimizer.py` for logging rowwise Adagrad engine. Test Plan: CI + sandcastle Reviewed By: boryiingsu Differential Revision: D33439337 fbshipit-source-id: b158249b8df771c0ec8b642210ede39972929b00	2022-01-08 10:07:06 -08:00
Richard Barnes	49a07c8922	Suppress some unused variable warnings in Sorting.cu and TensorTopK.cu (#70999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70999 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D33470240 fbshipit-source-id: 906932cb5f497c77465b70ec9bc6fcb0705719de	2022-01-08 00:41:58 -08:00
Richard Barnes	d1e049c306	Fix some unused variable warnings and make some stuff const in ReplicationPadding.cu (#70998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70998 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D33460035 fbshipit-source-id: bdf70fd04cce40a2a8d60c2c405f8d6cee9127e5	2022-01-08 00:40:51 -08:00
Richard Barnes	11aa1961c1	Use (void)error_unused to avoid unused warning (#71000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71000 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33470600 fbshipit-source-id: 868a6ee33a04846bd1efbe06ab306fbaad3bf9db	2022-01-07 23:39:30 -08:00
Richard Barnes	704af23ee4	Use a reference in GetSingleArgument (#71007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71007 A string copy at Line 417 is currently consuming 125,749,287,000 cycles/day. I suspect the issue is with a copy-on-return, but we can experiment with introducing a reference in the middle to see if that produces a good savings without changing the interface. Reference ``` ["Inline caffe2::ArgumentHelper::GetSingleArgument @ caffe2/caffe2/utils/proto_utils.cc:417"] ``` Test Plan: Sandcastle Reviewed By: xw285cornell Differential Revision: D33478883 fbshipit-source-id: e863e359c0c718fcd0d52fd4b3c7858067de0670	2022-01-07 20:18:56 -08:00
Zhengxu Chen	9762aa0fdc	Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. Test Plan: revert-hammer Differential Revision: D33284352 (`0a921ba0d0`) Original commit changeset: 997c4f110b36 Original Phabricator Diff: D33284352 (`0a921ba0d0`) fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e	2022-01-07 19:58:03 -08:00
Ilya Persky	f626bef598	Fix docstring for nn.Hardshrink (#71012 ) Summary: Fixes nn.Hardshrkink's docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71012 Reviewed By: dagitses Differential Revision: D33482333 Pulled By: jbschlosser fbshipit-source-id: 00eea76299676fc97c5cc31421af9c73665bfcf4	2022-01-07 18:56:47 -08:00
Zhengxu Chen	0a921ba0d0	[jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338 Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases. ghstack-source-id: 146727330 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: iseeyuan Differential Revision: D33284352 fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed	2022-01-07 18:35:32 -08:00
Zhengxu Chen	3f3eae6737	[jit] Split Tensor type implementations to separate file. (#70121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70121 Code move all TensorType dependencies into a separate `tensor_type.cpp`, so that we don't link with it in the min runtime accidentally. ghstack-source-id: 146727331 (Note: this ignores all push blocking failures!) Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D33102286 fbshipit-source-id: e9fe176201bd2696cb8c65c670fcf225e81e8908	2022-01-07 18:35:29 -08:00
Zhengxu Chen	53b9c0f12d	[jit] Polymorphic IValue::type() for DynamicType. (#70120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70120 Before the change: ``` c10::Type t = ivalue.type(); ``` After the change: ``` c10::Type t = ivalue.type(); c10::DynamicType d = ivalue.type<c10::DynamicType>(); // new path ``` The new path will be adopted in PyTorch Lite Interpreter to support lightweight type reflection. Note that type getters are selected at compile time so no performance overhead. The benefits of having a DynamicType will be elaborated in a separate document, but in short, DynamicType provides an isolated type system for controlling binary size bloat, and shrink down ~20 supported Type symbols into one so that the size taken by specializations and function name symbols are greatly reduced. Lite Interpreter should only use the `<DynamicType>` variant of the interfaces from aten, to reduce binary size. ghstack-source-id: 146727334 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: gmagogsfm Differential Revision: D33102276 fbshipit-source-id: c5354e7d88f9de260c9b02636214b40fe15f8a10	2022-01-07 18:35:26 -08:00
Zhengxu Chen	62909facb3	[jit] Decouple ivalue.h from jit_type.h (#70119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70119 JIT type and IValue have a mutual dependency because of various reasons today. It makes things worse when we have `jit_type.h` and `ivalue.h` mutually include each other, causing non deterministic name resolutions at different translation units, preventing us safely use symbols from `jit_type.h` in `ivalue.h` . This diff doesn't address the mutual dependency between JIT type and IValue at linking level, but at header level. We choose to remove include of `ivalue.h` from `jit_type.h` because it's way harder to make a type-free header for IValue. The way we achieve this is by removing EnumType (which is the only type depending on IValue in JIT types) from `jit_type.h`, and let downstream users to specifiy an explicit `enum_type.h` as needed. We also move some IValue inline member function definitions back to `ivalue_inl.h` so that `jit_type.h` doesn't need IValue definition to be present. We also remove a seemingly accidental include of `jit_type.h` from `ATen/core/List_inl.h` so that `ivalue.h` can include `jit_type.h` directly, otherwise due to another mutual inclusion between `ivalue.h` and `List_inl.h` we can still get nondeterministic behavior. ghstack-source-id: 146727333 (Note: this ignores all push blocking failures!) Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D33155792 fbshipit-source-id: d39d24688004c2ec16c50dbfdeedb7b55f71cd36	2022-01-07 18:34:17 -08:00
Shiyan Deng	0eb2fc608c	[fx_acc] ensure all acc ops args to be keyword arguments (#70952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70952 ATT Test Plan: test_plan_no Reviewed By: wushirong Differential Revision: D33456343 fbshipit-source-id: 26b0c1042de6072ff8741617dd3523edc4a9b5fd	2022-01-07 17:53:36 -08:00
Terry Chen	0cd474b2ce	fix op not scriptable Summary: Fix torch.sort, min/max, torch.numel after quantization not scriptable Test Plan: python3 test/test_quantization.py TestQuantizeFxOps.test_general_shape_ops Reviewed By: jerryzh168 Differential Revision: D33467184 Pulled By: terrychenism fbshipit-source-id: 13775ab36d4007978df48c9af71d83398fce5161	2022-01-07 16:55:28 -08:00
BowenBao	d26e5ced72	Add missing docstrings for ONNX converter API. Fixes #67393 (#67640 ) (#68489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68489 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D32483783 Pulled By: msaroufim fbshipit-source-id: 512e4495040a6a9833d501de2301f1709b0352b9	2022-01-07 16:43:09 -08:00
Jerry Zhang	c59c86706e	[quant] Add back README.md for backend_config (#70964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70964 Accidently deleted before, adding this back. We'll make this more complete after the structure is finalized Test Plan: no test needed Imported from OSS Reviewed By: dagitses Differential Revision: D33470738 fbshipit-source-id: 00459a4b00514d3d0346de68788fab4cad8a5d12	2022-01-07 15:44:51 -08:00
Kaustubh Vartak	00e5610914	FX quant: allow duplicate named_modules during fbgemm lowering (#70927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70927 Earlier, when replacing deq-ref-quant modules, we get non-duplicate named modules only. When model contains dupicate names, the lowering fails the second time. This PR allows duplicates when getting the named modules. Test Plan: buck test //caffe2/torch/fb/model_transform/quantization/tests:fx_quant_api_test Reviewed By: jerryzh168 Differential Revision: D33440028 fbshipit-source-id: f2fabd49a293beb90d7b4bf471610cde6279fd66	2022-01-07 15:43:31 -08:00
Howard Huang	ad88354e25	torch.futures doc formatting (#70630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70630 Params is incorrectly formatted [here](https://pytorch.org/docs/master/futures.html?highlight=future#:~:text=way%20as%20then().-,Parameters,-callback%20(Future)%20%E2%80%93%20a): ![image](https://user-images.githubusercontent.com/14858254/148119877-6c719851-4edd-4126-8ef7-e6c1920304cf.png) Updated docs: https://docs-preview.pytorch.org/70630/futures.html?highlight=future#:~:text=way%20as%20then().-,Parameters,-callback%20(Future)%20%E2%80%93%20a cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: dagitses, mrshenli Differential Revision: D33478214 Pulled By: H-Huang fbshipit-source-id: 8cd7022ae79a8e6fe8b5fa8b767c55903c9ac368	2022-01-07 15:22:22 -08:00
Nikita Shulga	d583eca8c3	Add workflow to sync `fbsync`->`master` (#71013 ) Summary: Main logic of the workflow is implemented in `syncbranches.py` script, which computes patch-id's of divergent history (as determined by `git merge-base`) and treats all patches present in sync branch with non-matching patch-ids as ones missing in target branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/71013 Reviewed By: bigfootjon Differential Revision: D33480885 Pulled By: malfet fbshipit-source-id: bd72c061720d0cba49c6754ec4e94437d8a5c262	2022-01-07 15:09:23 -08:00
George Qi	d7db5fb462	ctc loss no batch dim support (#70092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70092 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33280068 Pulled By: george-qi fbshipit-source-id: 3278fb2d745a396fe27c00fb5f40df0e7f584f81	2022-01-07 14:33:22 -08:00
Michael Suo	9032d73f3b	Disable cpp tests in multigpu job (#71015 ) Summary: See if this fixes the timeouts described in https://github.com/pytorch/pytorch/issues/70015 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71015 Reviewed By: dagitses Differential Revision: D33483762 Pulled By: suo fbshipit-source-id: 09bf93e73669a1211b200b4b272bfaa0d78a21d2	2022-01-07 14:32:01 -08:00
Erjia Guan	0721fc6474	Decouple MapDataPipe from Dataset (#70991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70991 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33477680 Pulled By: ejguan fbshipit-source-id: d3e89492e921a96791319f35052a229684ddf7cf	2022-01-07 14:28:41 -08:00
Joel Schlosser	3febe0d986	Remove backward op for 3d depthwise convolution (#70462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70462 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33340495 Pulled By: jbschlosser fbshipit-source-id: a180951680aef8fb123463af098582ef6cf9bbdb	2022-01-07 14:24:34 -08:00
Joel Schlosser	704fbc29ae	Remove backward op for 2d depthwise convolution (#70461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70461 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33340494 Pulled By: jbschlosser fbshipit-source-id: f2d8b2fcf9ad0f42b644b1dba51a694d83975566	2022-01-07 14:23:15 -08:00
Akshit Khurana	a70297e7cb	NNAPI: quant logistic fix (#70847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70847 NNAPI needs a fixed zero point and scale for sigmoid (logistic) ghstack-source-id: 146555935 Test Plan: LIBNEURALNETWORKS_PATH="/path/to/libneuralnetworks.so" pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33237918 fbshipit-source-id: 05ef3a81bf1589ad44b599a19bce4066531c432b	2022-01-07 13:36:33 -08:00
Yi Wang	ed50a35cf8	[Model Averaging] Update the documentation of PeriodicModelAverager (#70974 ) Summary: Here 20 is a bad example, since the warmup step is set as 100. 200 iterations will make much more sense. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70974 Reviewed By: dagitses Differential Revision: D33474576 Pulled By: rohan-varma fbshipit-source-id: 4c7043108897848bde9503d77999971ad5567aa6	2022-01-07 13:20:42 -08:00
kshitij12345	c8b897333c	[rnn/gru] no batch dim (#70977 ) Summary: Reference https://github.com/pytorch/pytorch/issues/60585 Reland: https://github.com/pytorch/pytorch/pull/70442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70977 Reviewed By: dagitses, george-qi Differential Revision: D33477256 Pulled By: jbschlosser fbshipit-source-id: 2035c2d00b2f627c7046fd9b13c71b9360cd6fad	2022-01-07 13:14:41 -08:00
Jiewen Tan	338eb1b2b3	[LTC] Export torch::lazy::GetBackendDevice() (#70963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70963 This commit exports torch::lazy::GetBackendDevice(). Test Plan: CI in the lazy_tensor_staging branch. Reviewed By: wconstab Differential Revision: D33468938 Pulled By: alanwaketan fbshipit-source-id: f65599c9238bf6b4f4ffbd5194befdc267272831	2022-01-07 13:13:18 -08:00
Michael Suo	0a002f879e	Actually clean on clean workspace, including hidden files (#71018 ) Summary: The workspace should be totally empty before checking out PyTorch; this is especially important with non-ephemeral runners. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71018 Reviewed By: robieta Differential Revision: D33482985 Pulled By: suo fbshipit-source-id: cafa123d2b893bfbdad62295586b5b79f1542b3a	2022-01-07 13:04:54 -08:00
Zhengxu Chen	bc026c0577	[jit] Split Union type and Optional type to separate impl file. (#69483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69483 To avoid accidental linking to Union type and Optional type in Edge runtimes, we can separate these types into different files, so that we don't accidentally link with them in type.cpp. ghstack-source-id: 146670525 Test Plan: just code move. Reviewed By: ejguan Differential Revision: D32264607 fbshipit-source-id: c60b6246f21f3eb0a67f827a9782f70ce5200da7	2022-01-07 11:23:15 -08:00
Zhengxu Chen	1011ac188f	[jit][edge] Create DynamicType for OptionalType in mobile. (#68137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137 A small step to replace existing OptionalType usage to DynamicType in Edge runtime. ghstack-source-id: 146670520 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D32264617 fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718	2022-01-07 11:23:12 -08:00
Zhengxu Chen	0517e719ac	[jit] Add conformance test for DynamicType with server JIT types. (#69482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482 Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system. ghstack-source-id: 146670526 Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType' Reviewed By: gmagogsfm Differential Revision: D32891263 fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b	2022-01-07 11:23:09 -08:00
Zhengxu Chen	649dda9fee	[jit] Implement DynamicType for TorchScript runtime. (#68136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68136 DynamicType is an extension to existing server JIT types. Today using normal server types on Edge is a bit problematic because in embedded environments we don't need the full spectrum of types but we still build with these unneeded dependencies. Is it possible to just get rid of unneeded JIT types from Edge builds? It's not easy to do so at this moment. For example, on Edge we don't support Union type, but we have to pull in the dependency of Union type because Optional type is being supported which inherits from Union type, so Union type has to be included in the build. Although we could split Union type and Optional type, it could be argued that the root cause is every time we use anything inheriting from `c10::Type`, we don't have the direct evidence of how much dependency we pull in, because we do virtual calls and we don't know what exactly we're calling with server JIT types. If we don't know, it's highly possible that the linker doesn't know either so it cannot effectively strip unused methods. To address this problem, one option is to implement a separate `DynamicType` which has simpler behavior and doesn't store different types as different symbols in binary but rather raw data (or "tag"). This could increase the binary size by several KBs, so I included several binary size reductions in the same stack, hoping at least we don't regress the binary size. Currently `DynamicType` inherits from `c10::Type` because I want to reduce the migration cost of `DynamicType` by making it interfacing with existing server JIT types. In the future `DynamicType` should be implemented as a separate class without relying on `c10::Type` to make things both simpler and leaner. ghstack-source-id: 146670522 Test Plan: in the next diff. Reviewed By: VitalyFedyunin Differential Revision: D32264615 fbshipit-source-id: 180eb0998a14eacc1d8b28db39870d84fcc17d5b	2022-01-07 11:23:07 -08:00
Zhengxu Chen	0408449244	[jit] Reclaim some binary size. (#68038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68038 Replace const std::function& to c10::function_ref because the former uses type erasure and adds 5-10 KB size overhead and adds another level of indirection to call the underlying functions. In contrast a non-owning c10::function_ref will just compile down to a raw function pointer which should be much smaller. ghstack-source-id: 146670523 Test Plan: eyes Reviewed By: iseeyuan, mrshenli Differential Revision: D32264619 fbshipit-source-id: 558538fd882b8e1f4e72c4fd5e9d36d05f301e1e	2022-01-07 11:21:46 -08:00
Jake Tae	dd1121435b	SequentialLR update _last_lr on step (#70558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68956. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70558 Reviewed By: dagitses Differential Revision: D33430213 Pulled By: albanD fbshipit-source-id: 446f182610de32db224d55b244d76c3076e8080f	2022-01-07 10:36:35 -08:00
Nikita Shulga	195181d4df	Revert "add very dumb retry to ecr gc" This reverts commit 22f528043342ea06d00835616e8447e2b8c94adb.	2022-01-07 10:29:13 -08:00
Alban Desmaison	c6e727d05b	Fix adamw formula doc (#68587 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68587 Reviewed By: dagitses, jbschlosser Differential Revision: D33478646 Pulled By: albanD fbshipit-source-id: 4e6419829c3faa7449c041e7d467a6dab30fe917	2022-01-07 10:15:16 -08:00
vfdev	08074c8f2d	Update gradcheck.py (#70950 ) Summary: Following https://github.com/pytorch/pytorch/pull/64837#discussion_r779870974 Changed torch.equal to torch.allclose as exact comparision could be flaky Pull Request resolved: https://github.com/pytorch/pytorch/pull/70950 Reviewed By: albanD Differential Revision: D33462426 Pulled By: anjali411 fbshipit-source-id: aeaba9d2a98d1d0af04fa2cab8c495c23ec0a9cc	2022-01-07 09:29:10 -08:00
Emilio Castillo	8dfff8b2e2	Fix scatter for empty indexes (#70662 ) Summary: This PR fixes an issue with `scatter` where the output is garbage for zero-sized indexes. ```py import torch null_index = torch.zeros((0, 4), dtype=torch.int64) null_arr = torch.zeros((0, 4)) zeros_arr = torch.zeros((1, 4)) result = zeros_arr.scatter(0, null_index, null_arr) print(null_index) print(null_arr) print(zeros_arr) print(result) ``` ``` tensor([], size=(0, 4), dtype=torch.int64) tensor([], size=(0, 4)) tensor([[0., 0., 0., 0.]]) tensor([[1.7036e+19, 2.9965e+32, 3.9133e-14, 1.3585e-19]]) ``` the out array is never filled if `index` arg has 0 elements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70662 Reviewed By: dagitses Differential Revision: D33476807 Pulled By: albanD fbshipit-source-id: 97dbdd9c0133899e58828c43ecba81838807b8af	2022-01-07 09:20:43 -08:00
Scott Wolchok	4e7e8f2826	[PyTorch] Outline destructor of CppFunction (#63688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63688 CppFunction is used for function registration, so it's not performance-sensitive. Outlining the destructor should reduce code size. ghstack-source-id: 146648927 Test Plan: Mobile buildsizebot Reviewed By: dhruvbird Differential Revision: D30462640 fbshipit-source-id: de410f933bf936c16769a10a52092469007c8487	2022-01-07 09:16:23 -08:00
Yi Zhang	40c512f52c	split cuda for all 11.X (#70899 ) Summary: the code didn't support 11.5 or above Pull Request resolved: https://github.com/pytorch/pytorch/pull/70899 Reviewed By: ngimel Differential Revision: D33469544 Pulled By: janeyx99 fbshipit-source-id: ea38de36b025051f76322fe840e3851408195160	2022-01-07 09:11:16 -08:00
Rodrigo Kumpera	2378421340	Implement torch.allclose for sharded tensor. (#70331 ) Summary: Implement torch.allclose op for sharded tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70331 Test Plan: Automated test added. pritamdamania87 Fixes https://github.com/pytorch/pytorch/issues/67112 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Reviewed By: pritamdamania87 Differential Revision: D33339137 Pulled By: kumpera fbshipit-source-id: 4263e468eaa117317b190f69877bf3f8bbac5658	2022-01-07 08:37:04 -08:00
Ilya Persky	997fa8671d	Fix docstring for nn.Hardsigmoid (#70987 ) Summary: Fixes nn.Hardsigmoid's docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70987 Reviewed By: dagitses Differential Revision: D33476974 Pulled By: albanD fbshipit-source-id: bf3a1c485dd2c369c56981f9afbfe45aa9cee2cc	2022-01-07 08:13:53 -08:00
Bin Bao	f135438d3b	Dispatch to at::convolution intead of at::_convolution in _convolution_double_backward (#70661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70661 Dispatching to at::convolution can make Lazy Tensor trace the right convolution op. Test Plan: pytest test/test_nn.py -k test_conv_double_backward_strided_with_3D_input_and_weight Reviewed By: wconstab, jbschlosser Differential Revision: D33428780 Pulled By: desertfire fbshipit-source-id: 899e4135588ea33fff23d16103c25d9bcd3f902c	2022-01-07 07:53:46 -08:00
Mike Iovine	9ad21091dd	[SR] Give VarStackNodeWrapper an iterator (#69922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69922 D32596934 (`65f54bc000`) made the serial stack implementation a bit brittle. It introduced a new container type: `VarStackNodeWrapper`. This type was used as a template parameter in the serial stack implementation. The other type used in the serial stack implementation is `at::ArrayRef<at::Tensor>`. Ideally, the interface of `VarStackNodeWrapper` should be as close as possible to this other type. However, because the new container type did not have an iterator, expressions like this would fail to compile: ``` for (const auto& tensor : tensors) { // do something } ``` Introducing this iterator will make the code easier to maintain going forward. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` I consider this a `VarStack` implementation detail, so I'd prefer not to test it directly. We can test it implicitly by adding some code to the serial stack implementation that uses the iterator. Reviewed By: swolchok Differential Revision: D33101489 fbshipit-source-id: 7cf44c072d230c41bd9113cf2393bc6a6645a5b5	2022-01-07 07:24:47 -08:00
Xiang Gao	6e16c9bb1d	Add support for deleteKey for FileStore (#69953 ) Summary: torch_ucc uses `deleteKey`, and trying to run PyTorch tests with torch_ucc leads to failure about `deleteKey not implemented for FileStore`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69953 Reviewed By: ngimel Differential Revision: D33458457 Pulled By: H-Huang fbshipit-source-id: f46afd59f950722ae594d9aafb8843f14019e930	2022-01-07 06:20:59 -08:00
Matthias Braun	d697bb4220	Adapt llvm_codegen.cpp to LLVM TOT (#70810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70810 Adapt to LLVM top-of-tree APIs. For context: LLVM is moving towards opaque pointers for IR values: https://llvm.org/docs/OpaquePointers.html I also changed some `value->getScalarType()->getPointerElementType()` expressions to directly reference relevant types. This is simpler and more in line with the intentions of the opaque IR pointers. (In fact I would expect those expressions to break in the future). I did not fix places where the relevant type wasn't obvious to me though. Test Plan: - ``` $ cd fbsource/fbcode $ tp2_update_fbcode llvm-fb --branch=staging # symlinks point to d9c037cf2b4f0268cb1897b99c8c87c5d0232616 TP2 revision $ buck build mode/opt-clang-thinlto unicorn:index_server -c unicorn.hfsort="1" -c cxx.profile="fbcode//fdo/autofdo/unicorn/index_server:autofdo" -c cxx.modules=False -c cxx.extra_cxxflags="-Wforce-no-error" ``` - Check sandcastle jobs Reviewed By: modiking Differential Revision: D33431503 fbshipit-source-id: 33f39d0a0c0f4b805ab877a811ea0a670f834abf	2022-01-07 05:07:25 -08:00
Jiewen Tan	87139d8532	[LTC] Sync LazyGraphExecutor and LazyTensor with the staging branch (#70867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70867 This commit syncs LazyGraphExecutor and LazyTensor with the staging branch's latest changes. Test Plan: CI in the lazy_tensor_staging branch. Reviewed By: wconstab, desertfire Differential Revision: D33440005 Pulled By: alanwaketan fbshipit-source-id: 0dd72643dbf81a87fc4b05019b6564fcb28f1979	2022-01-07 01:51:53 -08:00
Mikhail Zolotukhin	1cdc643714	[TensorExpr] Add a pass for trimming JIT graphs. (#66847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66847 Trimming means that we try to remove a small portion of the graph while keeping it valid, and we try performing this step that N times. This is useful for debugging when we try to find a minimal example reproducing the issue at hand. Differential Revision: D31751397 D31751397 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 07d8ba1435af8fd2d7b8cf00db6685543fe97a85	2022-01-07 01:03:59 -08:00
Mikhail Zolotukhin	8223ef1cd8	[TensorExpr] Clean-up logic for copying input tensors and remove some dead code. (#70535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535 This also fixes handling of inputs that happen to be outputs (they require copy). Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33399116 Pulled By: ZolotukhinM fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319	2022-01-07 01:03:56 -08:00
Mikhail Zolotukhin	5d7cc8f22a	[TensorExpr] Add some graph-rewrite passes to prepare models for AOT compilation. (#66515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515 These passes should not be used generally as they change API of the model's forward method, but they help experimenting with the model and ironing out all the kinks before it can be compiled properly. In the long run ideally we should provide a better way to enable such experiments. Differential Revision: D31590862 D31590862 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc	2022-01-07 01:03:53 -08:00
Mikhail Zolotukhin	cdbf83b0c3	[TensorExpr] Add helper passes for AOT pipeline. (#66514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66514 These passes will 1) help us analyze the graph before trying to compile it and report errors upfront if it's not possible, 2) fill in missing strides/dtype/device info in JIT IR. Ideally, this should be done by a dedicated JIT pass, but until it's available, we'll be using a hack-around defined here. Differential Revision: D31590860 D31590860 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ZolotukhinM fbshipit-source-id: fe8fdefbeacae8079958dd0b4b27809cc0acb34b	2022-01-07 01:02:31 -08:00
Michael Suo	a311cfa800	Revert D33460427: [pytorch][PR] [rnn/gru] : no batch dim Test Plan: revert-hammer Differential Revision: D33460427 (`6eba936082`) Original commit changeset: c64d9624c305 Original Phabricator Diff: D33460427 (`6eba936082`) fbshipit-source-id: 9a5000e202c5f383b03dd6caad9399e46e4ce80e	2022-01-06 23:37:28 -08:00
Richard Barnes	1622546050	use irange for loops (#70248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70248 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32813863 fbshipit-source-id: 527244b4a2b220fdfe7f17dee3599603f492a2ca	2022-01-06 23:14:29 -08:00
Richard Barnes	36d9e03ab7	Reserve vector in gather_ranges_to_dense_op.h (#70478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70478 Test Plan: Sandcastle Reviewed By: xw285cornell Differential Revision: D33339890 fbshipit-source-id: 50330e18e344f872d03f146cea0ed11eef4f506e	2022-01-06 23:10:28 -08:00
Horace He	df6eb9bbab	Fixed to_folder not saving dtype (#69983 ) Summary: As above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69983 Reviewed By: pbelevich, ngimel Differential Revision: D33466529 Pulled By: Chillee fbshipit-source-id: 2d2f0ad5b8e2492aba4c19fa034c8b6c0848a568	2022-01-06 22:15:56 -08:00
Jake Tae	23f902f7e4	Fix incorrect variable in autograd docs (#70884 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70884 Reviewed By: mruberry Differential Revision: D33463331 Pulled By: ngimel fbshipit-source-id: 834ba9c450972710e0424cc92af222551f0b4a4a	2022-01-06 20:53:10 -08:00
Michael Suo	22f5280433	add very dumb retry to ecr gc	2022-01-06 20:29:39 -08:00
Wei Wei	c18e6b790e	Adding elu,selu,softsign support for fx2trt (#70811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70811 Add support of the above ops in fx2trt. Reviewed By: 842974287 Differential Revision: D33407911 fbshipit-source-id: 8c635ddbd1cae6b0a0a04d345b0e0347111a6619	2022-01-06 19:42:24 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	70b18b9511	Fix comment indentation issue (#70227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70227 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33251107 Pulled By: tugsbayasgalan fbshipit-source-id: 293ffe5dde38480ea13963a2d7e1eb99dc594d22	2022-01-06 19:14:39 -08:00
Weiwen Xia	32bf5e0ef9	Add native impl of gelu for QuantizedCPU (#69968 ) Summary: Add native implementation of gelu for quantized CPU. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/69968 Reviewed By: ejguan Differential Revision: D33187095 Pulled By: vkuzo fbshipit-source-id: 4c4bf0eb47d2d9c2b8827174f2ccdea41986148a	2022-01-06 19:01:26 -08:00
kshitij12345	6eba936082	[rnn/gru] no batch dim (#70442 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60585 TODO: * [x] Doc updates Pull Request resolved: https://github.com/pytorch/pytorch/pull/70442 Reviewed By: zou3519 Differential Revision: D33460427 Pulled By: jbschlosser fbshipit-source-id: c64d9624c305d90570c79d11a28557f9ec667b27	2022-01-06 18:39:09 -08:00
Mengwei Liu	880a5b9ea6	[PyTorch] Move prim string ops to JIT op registry (#70501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70501 This PR migrates prim string ops to be registered into JIT op registry instead of dispatcher. Since the implementation of these ops are backend agnostic, there's no need to go through dispatcher. Relying on `test_jit_string.py` to verify the correctness of these ops. I'm also adding tests to make sure all the operators are covered. Test Plan: Rely on `test_jit_string.py`. Reviewed By: iseeyuan Differential Revision: D33351638 fbshipit-source-id: ecc8359da935a32d3a31add2c395a149a0d8892f	2022-01-06 18:26:28 -08:00
Scott Wolchok	ddea6980fe	[PyTorch][JIT] Don't refcount Type singletons (#69579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69579 This should help us avoid reference counting overhead on singleton Type subclasses without a major rewrite of the Type subsystem. ghstack-source-id: 146643993 Test Plan: Ran //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark with arguments `--op empty -niter 40 --stressTestRecordFunction --captureRecordFunctionInputs` on devbig with turbo off. Before: ``` I1206 13:47:15.037441 1201670 bench.cpp:144] Mean 0.737675 I1206 13:47:15.037463 1201670 bench.cpp:145] Median 0.736725 I1206 13:47:15.037468 1201670 bench.cpp:146] Min 0.722897 I1206 13:47:15.037473 1201670 bench.cpp:147] stddev 0.00508187 I1206 13:47:15.037482 1201670 bench.cpp:148] stddev / mean 0.00688903 ``` After: ``` I1206 13:48:16.830123 1205612 bench.cpp:144] Mean 0.66988 I1206 13:48:16.830150 1205612 bench.cpp:145] Median 0.663956 I1206 13:48:16.830157 1205612 bench.cpp:146] Min 0.65986 I1206 13:48:16.830164 1205612 bench.cpp:147] stddev 0.0335928 I1206 13:48:16.830171 1205612 bench.cpp:148] stddev / mean 0.0501475 ``` Static runtime startup is also improved; for CMF local_ro, time to initialize a predictor went from 10.01s to 9.59s. (Note: I wish I had a production workload to demonstrate the advantage of this on. I tried ctr_mobile_feed local_ro net but it was neutral. Anything that manipulates types or List/Dict a lot might be promising.) Reviewed By: suo Differential Revision: D32923880 fbshipit-source-id: c82ed6689b3598e61047fbcb2149982173127ff0	2022-01-06 17:39:16 -08:00
Joel Schlosser	e6befbe85c	Add flag to optionally average output attention weights across heads (#70055 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055 Reviewed By: bhosmer Differential Revision: D33457866 Pulled By: jbschlosser fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5	2022-01-06 17:32:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	cc7382dd92	Enable upgraders in TS server (#70539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70538 ghstack-source-id: 146384458 Test Plan: python test/test_jit.py TestUpgraders Reviewed By: gmagogsfm Differential Revision: D33375195 fbshipit-source-id: 170960b409175bb987cf9dbb65ffed3283e5f6f9	2022-01-06 17:10:30 -08:00
Joel Schlosser	7b8f73dd32	No-batch-dim support for ConvNd (#70506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70506 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33355034 Pulled By: jbschlosser fbshipit-source-id: 5a42645299b1d82cee7d461826acca1c5b35a71c	2022-01-06 16:53:50 -08:00
Animesh Jain	6896b2d734	[NNC Testing] Randomized loop nest infrastructure (#70410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70410 Trying again after #70174 was reverted. Earlier the env variable was read into a static var in C++ causing state to be retained and causing test failures. Static type is removed in this PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33321435 fbshipit-source-id: 6d108eb00cac9150a142ccc3c9a65a1867dd7de4	2022-01-06 16:21:42 -08:00
Jake Tae	b7742b437a	Allow RNN hidden_size to be 0 (#70556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56767. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70556 Reviewed By: ngimel Differential Revision: D33455156 Pulled By: jbschlosser fbshipit-source-id: 5dc57b09d7beb6ae81dfabc318e87c109bb4e6ae	2022-01-06 14:18:36 -08:00
Pearu Peterson	e7602a1e30	Fix multiplication of 0-D sparse tensors (#70749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70749 Fixes https://github.com/pytorch/pytorch/issues/65396 and a clang-tidy error. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33439136 Pulled By: cpuhrsch fbshipit-source-id: 45ec58de7c18db183f891431d4a26e98fd0e924a	2022-01-06 13:36:46 -08:00
Shintaro Iwasaki	4fa70a2483	[pytorch] fix hipify_python (#70619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70619 This Diff improves `hipify_python`, which is needed for AMD GPUs. Change 1: ``` if (c == "," or ind == len(kernel_string) - 1) and closure == 0: ``` This is needed to deal with the following case (ex: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/test/cuda_vectorized_test.cu#L111) ``` kernel<<<val, func()>>>(...) // In this case, kernel_string is "val, func()" // so closure gets 0 when ind == len(kernel_string) - 1. ``` Change 2: ``` mask_comments() ``` This is needed to deal with a case where "<<<" is included in a comment or a string literal (ex: https://github.com/pytorch/pytorch/blob/master/torch/csrc/deploy/interpreter/builtin_registry.cpp#L71) ``` abc = "<<<XYZ>>>" // Though this <<<XYZ>>> is irrelevant to CUDA kernels, // the current script attempts to hipify this and fails. ``` Test Plan: This patch fixes errors I encountered by running ``` python3 tools/amd_build/build_amd.py ``` I confirmed, with Linux `diff`, that this patch does not change HIP code that was generated successfully with the original script. Reviewed By: hyuen Differential Revision: D33407743 fbshipit-source-id: bec822e040a154be4cda1c294536792ca8d596ae	2022-01-06 13:27:43 -08:00
Vasiliy Kuznetsov	9c455d7086	dbr quant: add limited support for `torch.nn.ModuleList` (#70372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70372 Enables basic support for `torch.nn.ModuleList` in DBR quant by stopping it from being a leaf. For now, we require the user to check for `AutoQuantizationState` if they are looping over the contents without any bounds checking. In future PRs, we can explore how to solve this without requiring user code changes. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_module_list ``` Reviewed By: VitalyFedyunin Differential Revision: D33302329 Pulled By: vkuzo fbshipit-source-id: 1604748d4b6c2b9d14b50df46268246da807d539	2022-01-06 13:25:13 -08:00
Vasiliy Kuznetsov	c3f0c77b64	dbr quant support for custom leaf modules, part 3/x (#70349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70349 Makes sure that child modules of non traceable leaf modules do not participate in quantization swaps. This should feature complete the `non_traceable_module_class` feature. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_prepare_custom_config_dict_non_traceable_module_class ``` Reviewed By: VitalyFedyunin Differential Revision: D33296246 Pulled By: vkuzo fbshipit-source-id: 08287429c89ee6aa42d13ca3060a74679a478181	2022-01-06 13:25:10 -08:00
Vasiliy Kuznetsov	423d8aabbd	dbr quant: support for custom leaf modules, part 2/x (#70335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70335 Adds test case that functions are not quantized inside custom leaf modules. No logic change needed as it already works correctly. Note: FX scripting rewriter does not go into modules without auto-quant, which is why we are using torch.jit.trace to look at the graph. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_prepare_custom_config_dict_non_traceable_module_class ``` Reviewed By: jerryzh168 Differential Revision: D33286370 Pulled By: vkuzo fbshipit-source-id: 26c81c9e1ce7c4d38ddc1e318730cf1eaa25ff69	2022-01-06 13:25:07 -08:00
Vasiliy Kuznetsov	b12852eb41	dbr quant: support for custom leaf modules, part 1/x (#70330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70330 Starts adding support for custom leaf modules, part 1/x. In this PR, we ensure that leaf modules and all of their children do not get `AutoQuantizationState` objects attached to them. The API is matching prepare_fx, using the `prepare_custom_config_dict` argument and the `non_traceable_module_class` key within that dict. The next couple of PRs will ensure that modules and functions in leaves do not get quantized, keeping it separate to make PRs smaller. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_prepare_custom_config_dict_non_traceable_module_class ``` Reviewed By: jerryzh168 Differential Revision: D33285310 Pulled By: vkuzo fbshipit-source-id: 532025fda5532b420fad0a4a0847074d1ac4ad93	2022-01-06 13:25:04 -08:00
Vasiliy Kuznetsov	a8929c3278	dbr quant: unbreak case when child module not returning any outputs (#70329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70329 Fixes a crash in DBR when a child module does not return any tensors. This happens sometimes in user models. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_child_module_does_not_return_tensor ``` Reviewed By: VitalyFedyunin Differential Revision: D33285309 Pulled By: vkuzo fbshipit-source-id: 42b8cffb5ee02ce171a3e6c64d140bb5f217225a	2022-01-06 13:25:01 -08:00
Vasiliy Kuznetsov	f742853838	dbr quant: support functional linear without bias (#70328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70328 Currently linear with bias crashes DBR convert step, this PR fixes it. This unbreaks testing DBR on some customer models. Test Plan: ``` python test/test_quantization.py TestQuantizeDBRIndividualOps.test_linear_functional_nobias ``` Reviewed By: jerryzh168 Differential Revision: D33285311 Pulled By: vkuzo fbshipit-source-id: 757c7270be9e3ff9cdf2609b1e426e9fd34e50ff	2022-01-06 13:24:58 -08:00
Vasiliy Kuznetsov	c21a540866	dbr quant: support dynamic linear (#70257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70257 Makes dynamic quantization for linear module work in DBR quant. Coverage for more ops and functionals will be in future PRs. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33262300 Pulled By: vkuzo fbshipit-source-id: c1cb0f9dd3f42216ad6ba19f4222b171ff170174	2022-01-06 13:24:55 -08:00
Vasiliy Kuznetsov	dfb807d65e	dbr quant: do not attach auto_quant_state to observers (#70256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70256 Somewhere in previous PRs we started attaching AutoQuantState to observers. This PR removes this, as that has not purpose and makes model debugging more complicated. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33262299 Pulled By: vkuzo fbshipit-source-id: a3543b44c517325d57f5ed03b961a8955049e682	2022-01-06 13:23:43 -08:00
Jiewen Tan	524bbb1442	[LTC] Sync gen_lazy_tensor.py from the staging branch (#70385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70385 This commit sync gen_lazy_tensor.py from the lazy_tensor_staging branch to the master. Test Plan: CI in the lazy_tensor_staging branch. Reviewed By: wconstab Differential Revision: D33306232 Pulled By: alanwaketan fbshipit-source-id: a15c72b22418637f851a6cd4901a9f5c4be75449	2022-01-06 13:12:37 -08:00
Wei Wei	81b52c290f	Adding leaky_relu support for fx2trt (#70799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70799 Add op support in fx2trt for leaky_relu 1. add support in acc_ops and corresponding unit test 2. add support in acc_ops_converters and corresponding unit test Reviewed By: 842974287 Differential Revision: D33399095 fbshipit-source-id: 978340e64b35ffefabdc48273ddfa86b5ee1816e	2022-01-06 12:40:14 -08:00
Jane Xu	19f04da21e	GHA: Make WORKFLOW_ID not a concatenation of run_id and run_num (#70938 ) Summary: ![image](https://user-images.githubusercontent.com/31798555/148432431-f990a26b-55d4-414e-9abd-8cdb4b4e9844.png) Since both GITHUB_RUN_ID and GITHUB_RUN_NUM are unchanged in rerun attempts, there's little reason to track both. It ends up just being confusing and also hard to use in joins in queries. Currently, the only places the concatenated WORKFLOW_ID are used are for our test stats jsons in S3 and in our binary size stats in Scuba, code posted respectively: https://github.com/pytorch/pytorch/blob/master/tools/stats/print_test_stats.py#L824 https://github.com/pytorch/pytorch/blob/master/tools/stats/upload_binary_size_to_scuba.py#L58 And I don't think we use the WORKFLOW_IDs in either stats in any queries yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70938 Reviewed By: seemethere, ngimel Differential Revision: D33458655 Pulled By: janeyx99 fbshipit-source-id: 885b125a978fa0cc51553b08b8c63d5fdcf354d0	2022-01-06 12:34:10 -08:00
Jane Xu	10b55648f5	CI: remove unused yaml and make upload_binary_size_to_scuba script work with GHA (#70643 ) Summary: Removes unused pytorch-job-specs.yml It looks like the recent android GHA jobs use upload_binary_size_to_scuba.py, but a portion of the script was still using CIRCLE only variables Pull Request resolved: https://github.com/pytorch/pytorch/pull/70643 Reviewed By: ngimel Differential Revision: D33455659 Pulled By: janeyx99 fbshipit-source-id: cfe79a674641ed3327c7650d2107ace2a5050983	2022-01-06 10:05:27 -08:00
Shintaro Iwasaki	578fe11673	[pytorch][aten][cuda] fix LpNormFunctor (#70601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70601 `&` has lower precedence than `==`, so `==` will be evaluated first. This behavior should not be intended. This patch fixes it. Test Plan: 🧐 Carefully check the change. Reviewed By: hyuen Differential Revision: D33397964 fbshipit-source-id: e3ac5b04e4688dfbf9d8ac3e5c4aa72282bf6ee9	2022-01-06 09:50:34 -08:00
Jane Xu	c00d33033c	Remove repeat test for types in test nn (#70872 ) Summary: Helps fix a part of https://github.com/pytorch/pytorch/issues/69865 The first commit just migrates everything as is. The second commit uses the "device" variable instead of passing "cuda" everywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/70872 Reviewed By: jbschlosser Differential Revision: D33455941 Pulled By: janeyx99 fbshipit-source-id: 9d9ec8c95f1714c40d55800e652ccd69b0c314dc	2022-01-06 09:20:02 -08:00
Ilya Persky	bc514cb425	Skip distributed tests if built with USE_DISTRIBUTED=0 (#70677 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70677 Reviewed By: albanD Differential Revision: D33439808 Pulled By: janeyx99 fbshipit-source-id: 7f9971eb564dbbb6625fe5f78328c3abe3808719	2022-01-06 08:55:05 -08:00
soulitzer	ff408fca7f	Forward AD formulas for activation backwards (#70460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70460 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33405363 Pulled By: soulitzer fbshipit-source-id: f68b59857a609ff593e9e399b9287d58dacef9e2	2022-01-06 08:41:17 -08:00
soulitzer	3051aabd0e	Add forward AD formulas for convolution and some others (#69956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69956 Test Plan: Imported from OSS Reviewed By: albanD, bdhirsh Differential Revision: D33235974 Pulled By: soulitzer fbshipit-source-id: ea60d687edc5d62d92f3fd3cb6640421d32c908c	2022-01-06 08:39:51 -08:00
Vasiliy Kuznetsov	4916a21f10	quantization: fix scale+zp serialization of quantized BatchNorm{2\|3}d (#70432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70432 Scale and zero_point need to be buffers for serialization to work on them properly. This PR moves them to buffers. This is BC breaking, but the "before" state was completely broken (scale + zp were not serialized at all) so there is no value in trying to handle it. Test Plan: ``` python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization ``` ``` python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization ``` Imported from OSS Differential Revision: D33330022 D33330022 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 673c61f1a9f8f949fd9e6d09a4dbd9e5c9d5fd04	2022-01-06 08:26:20 -08:00
Richard Barnes	6773589a06	Drop some unused variables (#70879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70879 Sandcastle from layer_norm_kernel.cu Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D33439040 fbshipit-source-id: e7d0e37ab25d62c63f675da3b6eff670fd93b26a	2022-01-06 08:11:25 -08:00
Amir Khojaste	748790588c	Upgrading the loop to use irange (#70326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326 See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable. Test Plan: buck run //caffe2/torch/fb/sparsenn:test Reviewed By: r-barnes Differential Revision: D33243400 fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3	2022-01-06 07:06:53 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b0fdca8855	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33433730 Pulled By: tugsbayasgalan fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151	2022-01-05 23:57:22 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8bdbe94344	Add forward compatability tests in CI (#64139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64139 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30626912 Pulled By: tugsbayasgalan fbshipit-source-id: 781a88386701b42e2e86daaca0a779d1fc1c4df3	2022-01-05 23:40:06 -08:00
Michael Suo	402f2934bf	Revert D33262228: Per-overload torch.ops API Test Plan: revert-hammer Differential Revision: D33262228 (`8e6d1738a4`) Original commit changeset: 600dbf511514 Original Phabricator Diff: D33262228 (`8e6d1738a4`) fbshipit-source-id: 238fa88ea9c4f26c7511334765c07452fbca9655	2022-01-05 22:10:11 -08:00
Eli Uriegas	884aa2baad	ci: Make linux.*xlarge non-ephemeral (#70869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70869 Makes linux runners non-ephemeral to reduce the amount of CreateInstance calls that we have going towards AWS as well as to reduce the amount of github API calls we make in order to create new instances. Should help alleviate some of the queuing issues we may observe due to AWS / Github rate limits Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D33436874 Pulled By: seemethere fbshipit-source-id: b2736fb4c9d175b1b0e2efb5017dcb4a8d4c05f4	2022-01-05 22:04:21 -08:00
lezcano	2367face24	Prefer maybe_multiply when multiplying by a constant (#68185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68185 As per title We also fix the first input to `handle_r_to_c` fo `rsub`as it was flipped for the two inputs. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684855 Pulled By: mruberry fbshipit-source-id: ffeab8d561e657105b314a883260f00d0ae59bbf	2022-01-05 20:33:43 -08:00
lezcano	1a061c7fe1	Merge index_{add,fill,copy,select} sampling (#68184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68184 This was in the TODO list as the three operations are very similar. Did this as one of them was failing in the noncontig tests and I wanted to make sure that all of them were tested properly, as they all appear in the derivative formulas of each other. After this PR, these operations do pass the noncontiguous tests. cc mruberry Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684854 Pulled By: mruberry fbshipit-source-id: 5db58be8d1e1fce434eab9cdf410cbf1024bbdf9	2022-01-05 20:33:40 -08:00
lezcano	baeca11a21	Remove random_fullrank_matrix_distinc_singular_value (#68183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68183 We do so in favour of `make_fullrank_matrices_with_distinct_singular_values` as this latter one not only has an even longer name, but also generates inputs correctly for them to work with the PR that tests noncontig inputs latter in this stack. We also heavily simplified the generation of samples for the SVD, as it was fairly convoluted and it was not generating the inputs correclty for the noncontiguous test. To do the transition, we also needed to fix the following issue, as it was popping up in the tests: Fixes https://github.com/pytorch/pytorch/issues/66856 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684853 Pulled By: mruberry fbshipit-source-id: e88189c8b67dbf592eccdabaf2aa6d2e2f7b95a4	2022-01-05 20:33:37 -08:00
lezcano	08ef4ae0bc	Remove unnecessary sync in linalg.det (#67014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67014 LAPACK functions return negative infos when there was an unexpected input. This happens (for example) when the user does not specify matrices of the correct size. We already check all this things on the PyTorch end, so this check that induces a synchronisation is unnecessary. I also took this chance to avoid some code repetition in the computation of the determinant of `P`. I also changed the use of `ExclusivelyOwned<Tensor>` by regular `Tensors` + moving into the tuple, which should be as efficient or more. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684851 Pulled By: mruberry fbshipit-source-id: dc046d1cce4c07071d16c4e2eda36412bd734e0f	2022-01-05 20:33:34 -08:00
lezcano	4d4e81d869	Make linalg.lu_factor structured (#66934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66934 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684856 Pulled By: mruberry fbshipit-source-id: 1675448da9a8677c8420005ce753972234e7accc	2022-01-05 20:33:31 -08:00
lezcano	012c38e04d	Add contiguous_strides as a correct replacement of defaultStride (#67789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67789 `at::defaultStride` was added in https://github.com/pytorch/pytorch/pull/18779. As it was noted in that PR, it differs from the actual computation of the default strides when one or more of the dimensions of the tensor are zero. See https://github.com/pytorch/pytorch/pull/18779#discussion_r272296140 We add two functions, `contiguous_strides` and `contiguous_strides_vec` which correct this issue and we replace the previous (wrong) uses of `defaultStride`. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684852 Pulled By: mruberry fbshipit-source-id: 62997a5a97a4241a12e73e2be2e192b80b491cb1	2022-01-05 20:33:28 -08:00
lezcano	a35b4b49d2	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834069 Pulled By: mruberry fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb	2022-01-05 20:32:12 -08:00
Mostafa Elhoushi	3f53365086	define `get_dot_graph` (#70541 ) Summary: In the [docstring](https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/graph_drawer.py#L54-L60) we mention `get_dot_graph but it is not defined, so I defined it here. Not sure if this is preferred, or should we update the docstring to use `get_main_dot_graph` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70541 Test Plan: ``` g = FxGraphDrawer(symbolic_traced, "resnet18") with open("a.svg", "w") as f: f.write(g.get_dot_graph().create_svg()) ``` Reviewed By: khabinov Differential Revision: D33378080 Pulled By: mostafaelhoushi fbshipit-source-id: 7feea2425a12d5628ddca15beff0fe5110f4a111	2022-01-05 20:00:20 -08:00
Peter Bell	917d56a7e4	Copy: Fix conj bit being ignored on type mismatch (#68963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68963 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33064492 Pulled By: anjali411 fbshipit-source-id: 043f927d6bfff46bf5f8ea6fce9409f250bf8ff8	2022-01-05 17:59:32 -08:00
Pearu Peterson	cfc5519661	Support Sparse CSR transpose. Fix clang-tidy warnings. (#70582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70582 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33414446 Pulled By: cpuhrsch fbshipit-source-id: dd0888d9dd3885579e853643a60d13373b5d6b15	2022-01-05 17:41:51 -08:00
Mikayla Gawarecki	3a21f38a2e	Integrate multi_tensor zero_grad into Optimizer base class (#69936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936 Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D33346748 Pulled By: mikaylagawarecki fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7	2022-01-05 15:46:23 -08:00
anjali411	8e6d1738a4	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 TODO: disallow `default` as an overload name for aten operators. BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33262228 Pulled By: anjali411 fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909	2022-01-05 15:17:41 -08:00
Rishi Puri	f9e1a1c97f	Increase tolerance for test_adadelta (#69919 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69919 Reviewed By: cpuhrsch Differential Revision: D33286427 Pulled By: jbschlosser fbshipit-source-id: a2ca90683c14b6669f9b1804881ac675ba925fc5	2022-01-05 15:02:10 -08:00
Jake Tae	ce409d8f50	docs: clarify smooth l1 == l1 when beta == 0 (#70673 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68558. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70673 Reviewed By: albanD Differential Revision: D33430267 Pulled By: jbschlosser fbshipit-source-id: db92187ff4f2799b19a6c4a5a6b653e9211c3aca	2022-01-05 14:35:35 -08:00
Mike Ruberry	2431218ee4	Jiterates more ops (#70663 ) Summary: This PR jiterates: - lcm - i0e - i1e - ndtri - erfcx - digamma - trigamma - lgamma It also adds TODOs to jiterate `kaiser_window`, `igamma`, `igammac` and `polygamma`, but jiterating those ops requires more features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70663 Reviewed By: ngimel Differential Revision: D33420854 Pulled By: mruberry fbshipit-source-id: 6f32ac3cf24eda051bf19b6d20e94cdf81f50761	2022-01-05 13:57:25 -08:00
Scott Wolchok	a5bc44422a	[PyTorch] Remove the List/Dict move operations (#69370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69370 These operations are likely slower than copying because they perform a heap allocation and reference count bump, whereas copying is just a reference count bump. This diff is up to see 1) if anything breaks 2) if we can measure any improvements. ghstack-source-id: 146468907 Test Plan: Ran //sigrid/lib/features/tests:pytorch_feature_conversion_benchmark before/after ``` swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec7Stable ; end ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.43us 410.68K PyTorchFeatureConversionIdListBenchmark 3.74us 267.65K PyTorchFeatureConversionIdScoreListBenchmark 4.98us 200.81K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.43us 410.75K PyTorchFeatureConversionIdListBenchmark 3.75us 266.92K PyTorchFeatureConversionIdScoreListBenchmark 4.98us 200.97K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.44us 410.43K PyTorchFeatureConversionIdListBenchmark 3.75us 266.75K PyTorchFeatureConversionIdScoreListBenchmark 5.04us 198.23K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.43us 411.17K PyTorchFeatureConversionIdListBenchmark 3.74us 267.60K PyTorchFeatureConversionIdScoreListBenchmark 5.00us 199.84K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.44us 410.19K PyTorchFeatureConversionIdListBenchmark 3.73us 267.89K PyTorchFeatureConversionIdScoreListBenchmark 4.96us 201.46K ============================================================================ swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec8RemoveListAndDictMove ; end ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.47us 405.12K PyTorchFeatureConversionIdListBenchmark 3.60us 278.07K PyTorchFeatureConversionIdScoreListBenchmark 4.87us 205.44K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.45us 407.39K PyTorchFeatureConversionIdListBenchmark 3.63us 275.56K PyTorchFeatureConversionIdScoreListBenchmark 4.95us 202.17K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.47us 405.49K PyTorchFeatureConversionIdListBenchmark 3.63us 275.58K PyTorchFeatureConversionIdScoreListBenchmark 4.88us 205.05K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.52us 396.13K PyTorchFeatureConversionIdListBenchmark 3.59us 278.29K PyTorchFeatureConversionIdScoreListBenchmark 4.88us 204.94K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.46us 406.77K PyTorchFeatureConversionIdListBenchmark 3.62us 276.17K PyTorchFeatureConversionIdScoreListBenchmark 4.92us 203.07K ============================================================================ ``` Reviewed By: suo, hlu1 Differential Revision: D32836701 fbshipit-source-id: 6e1c3d81f1b4ee13156320263dac17f5256c1462	2022-01-05 13:49:22 -08:00
Richard Barnes	b283b1de39	Cleaning code in fbcode/caffe2/c10/core/TensorImpl.h (#70588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70588 Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D33399751 fbshipit-source-id: 3e507973f7a8f58635f3446650e85d0f959254c0	2022-01-05 13:40:59 -08:00
Nikita Shulga	395f853770	Parallelize docker dependency builds (#70866 ) Summary: Those scripts are run on 8 vCPU instances, so passing at least `-j6` makes sense Pull Request resolved: https://github.com/pytorch/pytorch/pull/70866 Reviewed By: atalman Differential Revision: D33435083 Pulled By: malfet fbshipit-source-id: c879ed928da0b77346a92976d2fe9ad92ba01b5e	2022-01-05 13:34:27 -08:00
Natalia Gimelshein	be298212a6	reduce igamma instantiations (#70666 ) Summary: Don't compile scalar versions of the kernel (there is no scalar overload), combine igamma and igammac kernels. Igamma cubin size 10 MB -> 2 MB on V100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70666 Reviewed By: malfet Differential Revision: D33431359 Pulled By: ngimel fbshipit-source-id: 440998f751251be274f40dd035efba08b8969192	2022-01-05 13:06:24 -08:00
Andrey Talman	6c4437118b	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: suo Differential Revision: D33433118 Pulled By: atalman fbshipit-source-id: c3adc7b75714efdb5b6acda5d4cddc068fb4a145	2022-01-05 11:46:32 -08:00
Xiaodong Wang	025cd69a86	[AMD] Fix some legacy hipify script (#70594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70594 Pull Request resolved: https://github.com/facebookincubator/gloo/pull/315 Fix some out-dated hipify script: * python -> python3 (fb internal) * rocblas return code * gloo makefile for hip clang Test Plan: Sandcastle + OSS build Reviewed By: malfet, shintaro-iwasaki Differential Revision: D33402839 fbshipit-source-id: 5893039451bcf77bbbb1b88d2e46ae3e39caa154	2022-01-05 11:34:25 -08:00
Heitor Schueroff	34c49d3d3b	Document torch.quantile interpolation kwarg (#70637 ) Summary: clone of https://github.com/pytorch/pytorch/pull/59397 This PR documents the interpolation kwarg parameter added in https://github.com/pytorch/pytorch/issues/49267. Now that the forward compatibility period is over, we can expose this parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70637 Reviewed By: jbschlosser Differential Revision: D33411707 Pulled By: anjali411 fbshipit-source-id: f5f2d0a6739b3a855bbdf58fc671ac2f0342ce69	2022-01-05 11:02:13 -08:00
Raghavan Raman	616afcf981	[jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320 ghstack-source-id: 146514368 Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit` Reviewed By: eellison Differential Revision: D33280508 fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505	2022-01-05 10:19:14 -08:00
Joel Schlosser	b60b1b100f	Set cuDNN deterministic flag for test_conv_double_backward_cuda (#69941 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69941 Reviewed By: george-qi Differential Revision: D33430727 Pulled By: jbschlosser fbshipit-source-id: 4a250bd0e5460ee631730afe0ab68ba72f37d292	2022-01-05 10:05:56 -08:00
Scott Wolchok	93c7504438	[PyTorch] Improve StorageImpl::set_data_ptr (#65432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65432 There is no reason to do an extra write to the input DataPtr (via `std::swap`) before returning a new DataPtr. ghstack-source-id: 146471376 Test Plan: Inspected assembly for this function to verify that we are really getting fewer instructions generated. I don't have a specific application for this at the moment, but it's clearly better IMO. Reviewed By: mikeiovine Differential Revision: D31097807 fbshipit-source-id: 06ff6f5fc675df0f38b0315b4147ed959243b6d0	2022-01-05 09:46:35 -08:00
Bin Bao	70d3b2700f	[LTC] Fix stride accessors in LTCTensorImpl (#70623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70623 Strides on lazy tensor should only be read after calling setup_size_properties. This fixes a failure in hf_Longformer. Test Plan: CI on the lazy_tensor_staging branch Reviewed By: wconstab, alanwaketan Differential Revision: D33410142 Pulled By: desertfire fbshipit-source-id: ccb2ba8d258bdb88f6b51be6196563f9c4c06cbf	2022-01-05 09:31:41 -08:00
Shirong Wu	6f473c80a5	Enable fx2trt CI test (#70658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70658 Config '--exclude-distributed-test' was intended to disabled fx2trt test on normal docker test suite, but test is auto disabled now. Remove config. Test Plan: CI https://github.com/pytorch/pytorch/actions/runs/1656375648 Reviewed By: houseroad Differential Revision: D33417803 fbshipit-source-id: 9dfb4cbd6fa9ad18a4be989ee86d1f8a298347f9	2022-01-05 09:28:58 -08:00
Peter Bell	4cbe140ec5	Add CI config to test USE_PER_OPERATOR_HEADERS=0 (#69907 ) Summary: The CMake build defaults to `USE_PER_OPERATOR_HEADERS = 1` which generates extra headers in the `ATen/ops` folder that don't exist otherwise. In particular, fb-internal builds using buck don't support these headers and so all includes must be guarded with `#ifdef AT_PER_OPERATOR_HEADERS`. This adds a CI run which builds with `USE_PER_OPERATOR_HEADERS = 0` so open source contributions don't have to wait for their PR to be imported to find out it doesn't work in fb-internal. This flag shouldn't effect runtime behavior though, so I don't run any tests. cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/69907 Reviewed By: malfet, atalman Differential Revision: D33411864 Pulled By: seemethere fbshipit-source-id: 18b34d7a83dc81cf8a6c396ba8369e1789f936e9	2022-01-05 09:18:06 -08:00
Stephan Uphoff	e1e43c4e71	Prevent sum overflow in broadcast_object_list (#70605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70605 broadcast_object_list casted the sum of all object lengths to int from long causing overflows. Test Plan: Add a Tensor with >2GB storage requirement (in distributed_test.py) to object broadcast. This Tensor is only added if test are running at Meta as github tests will oom. Without fix the length will overflow and the program will request a negative sized Tensor: ``` RuntimeError: Trying to create tensor with negative dimension -2147482417: [-2147482417] ``` With fix it will pass the test. Test used on server with GPUs: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --local -- broadcast_object buck test mode/dev-nosan //caffe2/test/distributed:distributed_gloo_spawn --local -- broadcast_object Reviewed By: r-barnes Differential Revision: D33405741 fbshipit-source-id: 972165f8297b3f5d475636e6127ed4a49adacab1	2022-01-05 09:07:39 -08:00
Jithun Nair	8ba27c576c	Upgrade CI to ROCm4.5.2 (#69886 ) Summary: cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/69886 Reviewed By: albanD, seemethere Differential Revision: D33429299 Pulled By: malfet fbshipit-source-id: c3d6d9e45e30d0149b04e59ea255d88bc0e933f2	2022-01-05 08:48:46 -08:00
Jane Xu	20489ebdc9	Increase tensor size for mem check tests (#70603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70603 Reviewed By: mruberry Differential Revision: D33410439 Pulled By: janeyx99 fbshipit-source-id: e94615ece6d0fdf230de5297118678b70f34a18c	2022-01-05 08:27:48 -08:00
kshitij12345	1aa98c7540	[docs] multi_head_attention_forward no-batch dim support (#70590 ) Summary: no batch dim support added in https://github.com/pytorch/pytorch/issues/67176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70590 Reviewed By: VitalyFedyunin Differential Revision: D33405283 Pulled By: jbschlosser fbshipit-source-id: 86217d7d540184fd12f3a9096605d2b1e9aa313e	2022-01-05 08:26:25 -08:00
Kshiteej K	e228b71dae	remove unnecessary skips in rsub OpInfo (#69973 ) Summary: Skips are unnecessary as https://github.com/pytorch/pytorch/issues/53797 was fixed Thanks Lezcano for finding the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69973 Reviewed By: mruberry Differential Revision: D33161663 Pulled By: anjali411 fbshipit-source-id: 06b75bc5fc0cf90239f17835c07b86b2282ec846	2022-01-05 08:22:38 -08:00
kshitij12345	216ae7bc91	[docs] Transformer: no batch dim support doc update (#70597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70597 Reviewed By: VitalyFedyunin Differential Revision: D33405284 Pulled By: jbschlosser fbshipit-source-id: 04f37e8b9798ded7fcedac48629645843a0e3a28	2022-01-05 08:20:51 -08:00
Ilya Persky	5543b7ce16	Fix docstring for nn.Softplus (#70576 ) Summary: Fixes nn.Softplus' docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70576 Reviewed By: VitalyFedyunin Differential Revision: D33407444 Pulled By: albanD fbshipit-source-id: 7f1f438afb1a1079d30e0c4741aa609c5204329f	2022-01-05 08:12:15 -08:00
Ilya Persky	657a7e74ed	Fix docstring for nn.Tanh (#70577 ) Summary: Fixes nn.Tanh's docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70577 Reviewed By: VitalyFedyunin Differential Revision: D33408564 Pulled By: albanD fbshipit-source-id: 2008cb55ef72b4b057d8d68e4505956aaf6cc3fa	2022-01-05 07:56:57 -08:00
Peter Bell	adceb13da1	Copy: Avoid extra dispatch in type-mismatch case (#68950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68950 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33064447 Pulled By: anjali411 fbshipit-source-id: 82bf4e144c1e629e30226eedc9d26ca63cfb4431	2022-01-05 07:32:47 -08:00
Peter Bell	e1aa5db108	Bazel: Only run ATen codegen once (#70147 ) Summary: Due to a merge conflict, the new bazel cuda build does something rather obnoxious. It runs ATen codegen with `--per-operator-headers` enabled and extracts a subset of the generated files; then calls it again without the flag to extract the CUDA files. This PR instead calls the codegen once but keeps track of what is CPU and what is CUDA in separate lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70147 Reviewed By: VitalyFedyunin Differential Revision: D33413020 Pulled By: malfet fbshipit-source-id: 4b502c38a209d1aa63d715e2336df6fc5aac2212	2022-01-05 06:56:52 -08:00
mattip	1681323ddc	DOC: Merge extraheader block from theme instead of override (#70187 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70185 The extraheader block in docs/source/_templates/layout.html overrides the one from the pytorch theme. The theme block adds Google Analytics, so they were missing from the `master` documentation. This came up in PR pytorch/pytorch.github.io#899. brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/70187 Reviewed By: bdhirsh Differential Revision: D33248466 Pulled By: malfet fbshipit-source-id: b314916a3f0789b6617cf9ba6bd938bf5ca27242	2022-01-05 06:42:38 -08:00
Vasiliy Kuznetsov	aea3d3ced7	dbr quant: stop calling eager quant convert (#70247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70247 Stops calling the eager mode quantization `convert` function from DBR quant convert, and instead implements the module swaps manually. This will make it easier to support quantization types other than static int8 in future PRs. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33255924 Pulled By: vkuzo fbshipit-source-id: afdfd61d71833d987bb38aa4d8c3d214f900c03e	2022-01-05 06:36:44 -08:00
Vasiliy Kuznetsov	4e90fa6a8c	dbr quant: break up test class into multiple classes (#70246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70246 Breaks up the large `TestQuantizeDBR` test case into 1. `TestQuantizeDBRIndividualOps` for testing functionality of ops 2. `TestQuantizeDBRMultipleOps` for testing non-fusion interactions between ops 3. `TestQuantizeDBR` for everything else We may need to refactor this more in the future, but this should unblock things for the near future. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR python test/test_quantization.py TestQuantizeDBRIndividualOps python test/test_quantization.py TestQuantizeDBRMultipleOps ``` Reviewed By: jerryzh168 Differential Revision: D33255925 Pulled By: vkuzo fbshipit-source-id: 82db1a644867e9303453cfedffed2d81d083c9cd	2022-01-05 06:36:41 -08:00
Vasiliy Kuznetsov	5b20052857	dbr quant: start recording ops which are not quantizeable (#70200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70200 Adds the logic to record not just the subgraphs which are quantizeable, but also the set of ops (not subgraph) which are quantizeable. This changes the information recorded during tracing as follows (an example): ``` // before 1. subgraph of conv1 -> conv2 2. no other information about other ops // after 1. subgraph of conv1 -> conv2 2. set of types of ops which were not quantizeable but were encountered during tracing ``` This has two uses: 1. easier development of DBR quant to cover more ops, as now the ops which are not being quantized are easier to inspect 2. easier understanding for the user of what DBR quant is doing or not doing for a model Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_unsupported_ops_recorded ``` Reviewed By: VitalyFedyunin Differential Revision: D33240997 Pulled By: vkuzo fbshipit-source-id: 3168eae286387e6cb01df3ae60dc13620fb784d5	2022-01-05 06:36:38 -08:00
Vasiliy Kuznetsov	80e685e2c0	dbr quant: start reusing static quant module mappings (#70196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70196 Deletes the custom DBR static quant module mapping, and reuses the global ones. Test coverage for all the ops will be in future PRs. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33240998 Pulled By: vkuzo fbshipit-source-id: da248b28d7b681794fa0494ff31fd065680f6fef	2022-01-05 06:35:11 -08:00
Matt Galloway	45f5a3ceb8	Fix generating files for Vulkan on Windows (#69696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69696 Using `find` is not portable as it won't be there on Windows for example. We can use `glob` with the recursive option added in Python 3.5 instead. Test Plan: CircleCI Reviewed By: xta0 Differential Revision: D32994229 fbshipit-source-id: 4a755c4313300142c051f533d0b3876dc9035da0	2022-01-05 05:32:13 -08:00
Hannes Friederich	c468e35d83	[caffe2] don't use __FUNCSIG__ when building for Windows with clang (#70561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70561 When building with strict(er) compiler warnings on Windows, clang complains that `__FUNCSIC__` is a proprietary language extension. When using clang, it seems we can use `__PRETTY_FUNCTION__` instead, like we do on other platforms. This is also in line with the logic on L100:127. Test Plan: CI Reviewed By: kalman5 Differential Revision: D33386400 fbshipit-source-id: d45afa92448042ddcd1f68adc7a9ef4643276b31	2022-01-04 23:44:56 -08:00
Jay Chae	12653be434	[PyTorch] Optimize no input NVTX collection (#70133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70133 we were creating `sstream` + string concats via `getNvtxStr` even when there were no inputs and wasting precious time. this diff avoids `stringstream` when there is no input to squeeze performance. 60% reduction in overhead Test Plan: Before ``` I1214 22:48:07.964118 2971180 bench.cpp:154] Mean 0.970494 I1214 22:48:07.964139 2971180 bench.cpp:155] Median 0.969054 I1214 22:48:07.964144 2971180 bench.cpp:156] Min 0.962247 I1214 22:48:07.964148 2971180 bench.cpp:157] stddev 0.00774841 I1214 22:48:07.964154 2971180 bench.cpp:158] stddev / mean 0.00798398 ``` After ``` I1214 22:59:00.039872 3437853 bench.cpp:154] Mean 0.384333 I1214 22:59:00.039896 3437853 bench.cpp:155] Median 0.384886 I1214 22:59:00.039899 3437853 bench.cpp:156] Min 0.370235 I1214 22:59:00.039902 3437853 bench.cpp:157] stddev 0.00435907 I1214 22:59:00.039907 3437853 bench.cpp:158] stddev / mean 0.0113419 ``` Reviewed By: aaronenyeshi, robieta Differential Revision: D33137501 fbshipit-source-id: ce0e8cf9aef7ea22fd8aed927e76be4ca375efc3	2022-01-04 23:40:22 -08:00
Akshit Khurana	44283c2766	NNAPI: Add qint16 support via int16 (#70621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70621 Pytorch doesn't have support for qint16 yet. Add an option to handle qint16 via int16 & qint32 data types. * For qint16 tensors in NNAPI, the user sends a qint32 tensor. We convert the qint32 to int16 for the converter and set the zero point and scale for nnapi * inputs to the model have to have fixed scale and zero point and are only supported for testing * Added a flag use_int16_for_qint16 which will be used maintain backwards compatibility in the converter when true qint16 is supported in PyTorch ghstack-source-id: 146507483 Test Plan: pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33285124 fbshipit-source-id: b6376fa1bb18a0b9f6a18c545f600222b650cb66	2022-01-04 23:12:38 -08:00
Scott Wolchok	10b40acbdb	[PyTorch][Static Runtime] Fast aliasing in select_tensor by manual borrowing (#68122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68122 See code comments for details; in brief, we repurpose support for borrowing `Tensor`s in `MaybeOwned` to make the `select_tensor` output a borrowed IValue that we have to clean up manually. If we have any other ops that always create a new reference to an existing Tensor, we can easily apply this same optimization. ghstack-source-id: 146482212 Test Plan: See perf measurements on ctr_mobile_feed local_ro net for this stack: P467203421 (local is neutral: P467267554) --do_profile output for local_ro (updated Dec 10): ``` swolchok@devbig032 /d/u/s/f/fbcode> tail Stable.profile.txt First iter time: 0.989023 ms Number of operators: 2037 Total number of managed tensors: 1597 Total number of managed output tensors: 0 Total number of unmanaged values: 2568 Number of unmanaged values requiring cleanup: 2568 Number of unmanaged values not requiring cleanup: 0 Total memory managed: 50368 bytes Total number of reused tensors: 1010 Total number of 'out' variant nodes/total number of nodes: 2001/2037 (98.2327%) swolchok@devbig032 /d/u/s/f/fbcode> ttail TMCC^C swolchok@devbig032 /d/u/s/f/fbcode> tail TMCOFastAliasing.profile.txt First iter time: 0.994703 ms Number of operators: 2551 Total number of managed tensors: 1146 Total number of managed output tensors: 0 Total number of unmanaged values: 4047 Number of unmanaged values requiring cleanup: 3533 Number of unmanaged values not requiring cleanup: 514 Total memory managed: 50048 bytes Total number of reused tensors: 559 Total number of 'out' variant nodes/total number of nodes: 2001/2551 (78.4398%) ``` for local: (also Dec 10): ``` ==> Stable.local.profile.txt <== First iter time: 9.0909 ms Number of operators: 1766 Total number of managed tensors: 1894 Total number of managed output tensors: 0 Total number of unmanaged values: 2014 Number of unmanaged values requiring cleanup: 2014 Number of unmanaged values not requiring cleanup: 0 Total memory managed: 4541440 bytes Total number of reused tensors: 847 Total number of 'out' variant nodes/total number of nodes: 1744/1766 (98.7542%) ==> TMCOFastAliasing.local.profile.txt <== First iter time: 7.5512 ms Number of operators: 2378 Total number of managed tensors: 1629 Total number of managed output tensors: 0 Total number of unmanaged values: 3503 Number of unmanaged values requiring cleanup: 2891 Number of unmanaged values not requiring cleanup: 612 Total memory managed: 3949312 bytes Total number of reused tensors: 586 Total number of 'out' variant nodes/total number of nodes: 1744/2378 (73.3389%) ``` Reviewed By: hlu1 Differential Revision: D32318674 fbshipit-source-id: a2d781105936fda2a3436d32ea22a196f82dc783	2022-01-04 22:36:13 -08:00
Scott Wolchok	4d8fc8693c	[PyTorch][Static Runtime] Support memory planning for torch.to() w/o requiring copying (#67223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67223 ghstack-source-id: 146482215 Test Plan: See perf measurements on ctr_mobile_feed local_ro net for this stack: P467203421 (local is neutral: P467267554) Reviewed By: hlu1 Differential Revision: D31776259 fbshipit-source-id: f84fcaa05029577213f3bf2ae9d4b987b68480b3	2022-01-04 22:36:10 -08:00
Scott Wolchok	1507ce90b2	[PyTorch][Static Runtime] Avoid managed output tensor DCHECK (#67221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67221 Update memory leak checks to not require that output tensors are cleaned up. ghstack-source-id: 146464297 Test Plan: Tests should still pass; reviewers to confirm that this is OK in principle Reviewed By: d1jang Differential Revision: D31847567 fbshipit-source-id: bb7ff2f2ed701e2d7de07d8032a1281fccabd6a9	2022-01-04 22:36:07 -08:00
Scott Wolchok	99a10c371f	[PyTorch][Static Runtime] Fix dtype changing between iterations for to() (#67394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67394 ghstack-source-id: 146464294 Test Plan: Added new test, which failed but now passes. Checked perf on ctr_mobile_feed local net (still not on recordio inputs yet), looks neutral ``` Stable, local ======================================== I1027 13:40:23.411118 2156917 PyTorchPredictorBenchLib.cpp:131] PyTorch predictor: number of prediction threads 1 I1027 13:40:48.708222 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.16975. Iters per second: 162.081 I1027 13:41:13.915948 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.1487. Iters per second: 162.636 I1027 13:41:38.984462 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.11408. Iters per second: 163.557 I1027 13:42:04.138948 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.13566. Iters per second: 162.982 I1027 13:42:29.342630 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.14269. Iters per second: 162.795 I1027 13:42:29.342669 2156917 PyTorchPredictorBenchLib.cpp:264] Mean milliseconds per iter: 6.14218, standard deviation: 0.0202164 0 FixToDtypeChanges, local ======================================== I1027 13:44:59.632668 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.11023. Iters per second: 163.66 I1027 13:45:24.894635 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.16308. Iters per second: 162.257 I1027 13:45:50.275280 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.17868. Iters per second: 161.847 I1027 13:46:15.637431 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.18688. Iters per second: 161.632 I1027 13:46:40.670816 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.10549. Iters per second: 163.787 I1027 13:46:40.670863 2176333 PyTorchPredictorBenchLib.cpp:264] Mean milliseconds per iter: 6.14887, standard deviation: 0.03843706 ``` Reviewed By: hlu1 Differential Revision: D31972722 fbshipit-source-id: 7a445b325a29020b31dd2bd61e4171ecc2793b15	2022-01-04 22:34:49 -08:00
Pearu Peterson	ab7d0df449	Support cloning CSR tensors (#70581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70581 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33413992 Pulled By: cpuhrsch fbshipit-source-id: 3a576d2c2f26d1edcc8f6932b2dbe2c7c11e9593	2022-01-04 21:41:18 -08:00
Modi Mo	d1dbcb1780	Change to use current LLLVM APIs (#70625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70625 In llvm-13 depricated APIs were removed. These APIs were just wrappers around APIs present in llvm-9+. Changed to use underlying APIs. Test Plan: buck build mode/opt-clang-thinlto -j 70 unicorn/topaggr:top_aggregator_server -c unicorn.hfsort="1" -c cxx.extra_cxxflags="-Wforce-no-error -fbracket-depth=300" -c cxx.profile="fbcode//fdo/autofdo/unicorn/topaggr/top_aggregator_server:autofdo" -c cxx.modules=False Reviewed By: WenleiHe Differential Revision: D33169593 fbshipit-source-id: c8923991b351a893ef8f6c0d01858149b63c0d33	2022-01-04 20:25:58 -08:00
Sahan Chanuka Paliskara	f8eaebc978	Avoid adding torch::deploy interpreter library to the data section (#70208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70208 Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt. Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy check the size of the .data section Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB ``` (pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy There are 55 section headers, starting at offset 0x24bac82b0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000200350 00000350 0000000000000028 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000200378 00000378 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000200398 00000398 0000000000000024 0000000000000000 A 0 0 4 [ 4] .dynsym DYNSYM 00000000002003c0 000003c0 0000000000d07a48 0000000000000018 A 9 1 8 [ 5] .gnu.version VERSYM 0000000000f07e08 00d07e08 0000000000115f86 0000000000000002 A 4 0 2 [ 6] .gnu.version_r VERNEED 000000000101dd90 00e1dd90 0000000000000510 0000000000000000 A 9 15 4 [ 7] .gnu.hash GNU_HASH 000000000101e2a0 00e1e2a0 00000000003b4fb0 0000000000000000 A 4 0 8 [ 8] .hash HASH 00000000013d3250 011d3250 0000000000457e20 0000000000000004 A 4 0 4 [ 9] .dynstr STRTAB 000000000182b070 0162b070 0000000004ef205a 0000000000000000 A 0 0 1 [10] .rela.dyn RELA 000000000671d0d0 0651d0d0 0000000000110b80 0000000000000018 A 4 0 8 [11] .rela.plt RELA 000000000682dc50 0662dc50 00000000000093f0 0000000000000018 A 4 35 8 [12] .rodata PROGBITS 0000000006837040 06637040 00000000034067a8 0000000000000000 AMS 0 0 64 [13] fb_build_info PROGBITS 0000000009c3d7f0 09a3d7f0 00000000000002ee 0000000000000000 A 0 0 16 [14] .gcc_except_table PROGBITS 0000000009c3dae0 09a3dae0 00000000014a9340 0000000000000000 A 0 0 4 [15] .eh_frame_hdr PROGBITS 000000000b0e6e20 0aee6e20 00000000004abf54 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 000000000b592d78 0b392d78 000000000200e344 0000000000000000 A 0 0 8 [17] .text PROGBITS 000000000d5a2000 0d3a2000 000000001e55944e 0000000000000000 AX 0 0 256 [18] .init PROGBITS 000000002bafb450 2b8fb450 0000000000000017 0000000000000000 AX 0 0 4 [19] .fini PROGBITS 000000002bafb468 2b8fb468 0000000000000009 0000000000000000 AX 0 0 4 [20] .never_hugify PROGBITS 000000002bafb480 2b8fb480 0000000000000db3 0000000000000000 AX 0 0 16 [21] text_env PROGBITS 000000002bafc240 2b8fc240 0000000000002e28 0000000000000000 AX 0 0 16 [22] .plt PROGBITS 000000002baff070 2b8ff070 00000000000062b0 0000000000000000 AX 0 0 16 [23] .tdata PROGBITS 000000002bb06000 2b906000 0000000000000b20 0000000000000000 WAT 0 0 8 [24] .tbss NOBITS 000000002bb06b40 2b906b20 0000000000007cb8 0000000000000000 WAT 0 0 64 [25] .fini_array FINI_ARRAY 000000002bb06b20 2b906b20 0000000000000028 0000000000000000 WA 0 0 8 [26] .init_array INIT_ARRAY 000000002bb06b48 2b906b48 0000000000008878 0000000000000000 WA 0 0 8 [27] .data.rel.ro PROGBITS 000000002bb0f3c0 2b90f3c0 0000000000029ce0 0000000000000000 WA 0 0 64 [28] .ctors PROGBITS 000000002bb390a0 2b9390a0 0000000000000010 0000000000000000 WA 0 0 8 [29] .dynamic DYNAMIC 000000002bb390b0 2b9390b0 0000000000000340 0000000000000010 WA 9 0 8 [30] .got PROGBITS 000000002bb393f0 2b9393f0 000000000001f040 0000000000000000 WA 0 0 8 [31] .bss.rel.ro NOBITS 000000002bb58440 2b958430 0000000000000c40 0000000000000000 WA 0 0 32 [32] .data PROGBITS 000000002bb5a000 2b959000 0000000000194188 0000000000000000 WA 0 0 4096 [33] .tm_clone_table PROGBITS 000000002bcee188 2baed188 0000000000000000 0000000000000000 WA 0 0 8 [34] .probes PROGBITS 000000002bcee188 2baed188 0000000000000002 0000000000000000 WA 0 0 2 [35] .got.plt PROGBITS 000000002bcee190 2baed190 0000000000003168 0000000000000000 WA 0 0 8 [36] .bss NOBITS 000000002bcf1300 2baf02f8 00000000005214f0 0000000000000000 WA 0 0 128 [37] .nvFatBinSegment PROGBITS 000000002c213000 2baf1000 0000000000002850 0000000000000000 A 0 0 8 [38] .nv_fatbin PROGBITS 000000002c216000 2baf4000 0000000052baed38 0000000000000000 WA 0 0 8 [39] .comment PROGBITS 0000000000000000 7e6a2d38 00000000000001dc 0000000000000000 MS 0 0 1 [40] .debug_aranges PROGBITS 0000000000000000 7e6a2f20 0000000001266c00 0000000000000000 0 0 16 [41] .debug_info PROGBITS 0000000000000000 7f909b20 000000007b21de49 0000000000000000 0 0 1 [42] .debug_abbrev PROGBITS 0000000000000000 fab27969 000000000179f365 0000000000000000 0 0 1 [43] .debug_line PROGBITS 0000000000000000 fc2c6cce 00000000176954ac 0000000000000000 0 0 1 [44] .debug_str PROGBITS 0000000000000000 11395c17a 0000000039dc32b0 0000000000000001 MS 0 0 1 [45] .debug_ranges PROGBITS 0000000000000000 14d71f430 0000000026a2d930 0000000000000000 0 0 16 [46] .debug_types PROGBITS 0000000000000000 17414cd60 000000000b211ff5 0000000000000000 0 0 1 [47] .debug_loc PROGBITS 0000000000000000 17f35ed55 000000009ca80c7e 0000000000000000 0 0 1 [48] .debug_macinfo PROGBITS 0000000000000000 21bddf9d3 000000000000151c 0000000000000000 0 0 1 [49] .note.stapsdt NOTE 0000000000000000 21bde0ef0 0000000000001b3c 0000000000000000 0 0 4 [50] .debug_macro PROGBITS 0000000000000000 21bde2a2c 0000000000040e6a 0000000000000000 0 0 1 [51] .torch_deploy_pay PROGBITS 0000000000000000 21be23896 0000000026ba5d28 0000000000000000 0 0 1 [52] .symtab SYMTAB 0000000000000000 2429c95c0 00000000020ce0c8 0000000000000018 54 863985 8 [53] .shstrtab STRTAB 0000000000000000 244a97688 000000000000025c 0000000000000000 0 0 1 [54] .strtab STRTAB 0000000000000000 244a978e4 00000000070309c6 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` Reviewed By: shunting314 Differential Revision: D33243703 fbshipit-source-id: 09a798113766c716297458cea7a74f074268dc82	2022-01-04 19:57:06 -08:00
Pearu Peterson	2292520bdc	Fix genSparseCSRTensor: generate non-trivial values for uint8 dtype. (#70580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70580 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33413597 Pulled By: cpuhrsch fbshipit-source-id: 313b08e1bd96ffb8d5c7a0fda9384502325e5d08	2022-01-04 18:02:36 -08:00
Michael Carilli	29ff596dca	[CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444 ) Summary: This PR was not my worst debugging annoyance, nor my smallest in lines changed, but it has the highest `debugging annoyance/lines changed` ratio. The current pattern ``` self.num_batches_tracked = self.num_batches_tracked + 1 ``` , if captured, deletes an eagerly-allocated tensor and overwrites it with a captured tensor. Replays read from the (deallocated) original tensor's address. This can cause 1. an IMA on graph replay 2. failure to actually increment `num_batches_tracked` during graph replay, because every replay reads from the old location without adding to it 3. numerical corruption if the allocator reassigns the original tensor's memory to some unrelated tensor 4. combinations of 1, 2, and 3, depending on global allocation patterns and if/when the BN module is called eagerly sometimes between replays (ask me how I know). Pull Request resolved: https://github.com/pytorch/pytorch/pull/70444 Reviewed By: albanD Differential Revision: D33342203 Pulled By: ngimel fbshipit-source-id: 5f201cc25030517e75af010bbaa88c452155df21	2022-01-04 17:06:46 -08:00
Joel Schlosser	14457bb8cb	Remove backward op for slow 3d transposed convolution (#69933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69933 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33131343 Pulled By: jbschlosser fbshipit-source-id: 4300c66f0f4811c949f82c62d17c7b5200cd15a3	2022-01-04 16:55:43 -08:00
Michael Suo	1adb70c6f0	Revert D33409880: [pytorch][PR] Deprecating Python 3.6 Test Plan: revert-hammer Differential Revision: D33409880 (`d95be99561`) Original commit changeset: 4f9123398960 Original Phabricator Diff: D33409880 (`d95be99561`) fbshipit-source-id: 32dc1c3c07ef99a04fab7d0fb742cf4e6c4b718a	2022-01-04 16:37:09 -08:00
Yanghan Wang	8369a46417	[maskrcnn] use stable sort in mask rcnn caffe2 ops (#70510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70510 Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3838 Pull Request resolved: https://github.com/facebookresearch/mobile-vision/pull/58 Pull Request resolved: https://github.com/fairinternal/detectron2/pull/567 D32694315 changes the implementation of sorting in NMS to stable sort. While the C2 operators are using non-stable sort. This causes test failure such as: - mobile-vision/d2go/tests:fb_test_meta_arch_rcnn - test_export_caffe2 (d2go.tests.fb.test_meta_arch_rcnn.TestFBNetV2MaskRCNNFP32) (architecture: x86_64, buildmode: dev-nosan, buildsystem: buck, compiler: clang, sanitizer: none) https://www.internalfb.com/intern/testinfra/diagnostics/7318349463675961.562949999530318.1640814509/ - mobile-vision/d2go/tests:fb_test_meta_arch_rcnn - test_export_torchscript_mobile_c2_ops (d2go.tests.fb.test_meta_arch_rcnn.TestFBNetV2MaskRCNNFP32) (architecture: x86_64, buildmode: dev-nosan, buildsystem: buck, compiler: clang, sanitizer: none) https://www.internalfb.com/intern/testinfra/diagnostics/7318349463675961.844424980844724.1640814504/ To illustrate, in the failed test_export_caffe2 test, the inputs of BoxWithNMSLimit are: ``` (Pdb) ws.FetchBlob("246") array([[0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568, 0.01234568]], dtype=float32) (Pdb) ws.FetchBlob("248") array([[ 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20., 0., 0., 10., 20.]], dtype=float32) (Pdb) ws.FetchBlob("249") array([1.], dtype=float32) ``` This contains 81 boxes (representing 81 classes) with equal score, stable sort will return the class id 0; while the non-stable sort returns class id 50. This diff changes the sorting to stable sort for BoxWithNMSLimit op. Test Plan: The D2 (`401a6b682b`)Go's tests can pass after this change. ``` buck test mode/dev-nosan //mobile-vision/d2go/tests:fb_test_meta_arch_rcnn -- --run-disabled ``` https://www.internalfb.com/intern/testinfra/testrun/4785074687594820 Reviewed By: newstzpz Differential Revision: D33355251 fbshipit-source-id: 9f3fc230b852a5e43f0e3cb8fa9093cbaf53e8b6	2022-01-04 16:33:10 -08:00
Natalia Gimelshein	b16b444828	don't unsqueeze every stack arg if possible (#70288 ) Summary: Fixes T98738497 Use `cat` and `view` if possible, instead of unsqueezing every arg. Helps perf when there are a lot of small arguments to `stack`. Benchmark: ``` import torch from torch.utils.benchmark import Timer inputs = [torch.randn([1, 128]) for _ in range(500)] out = torch.empty(1,500,128) def stack_cat(inputs): cat_result = torch.concat(inputs, dim=1) return cat_result.view( [1, 500, 128]) timer_stack = Timer(stmt="torch.stack(inputs, dim=1)", globals=globals()) timer_cat = Timer(stmt="stack_cat(inputs)", globals=globals()) print("stack ", timer_stack.blocked_autorange().median) print("cat ", timer_cat.blocked_autorange().median) ``` Before: ``` stack 0.00023390522226691247 cat 7.437262553721667e-05 ``` After ``` stack 7.397504318505526e-05 cat 7.37407322973013e-05 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70288 Reviewed By: robieta, mruberry Differential Revision: D33289789 Pulled By: ngimel fbshipit-source-id: b57dcb8ec66e767f552c08deeba330f31ae6c3d0	2022-01-04 16:07:30 -08:00
Peter Bell	f8f96d4858	Copy: Re-use existing neg and conj kernel implementations (#68949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68949 This reuses the existing `neg_kernel` and `conj_kernel` implementations for copy, saving some binary size and compile time. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33064390 Pulled By: anjali411 fbshipit-source-id: eb0ee94ed3db44ae828ea078ba616365f97a7ff5	2022-01-04 15:30:31 -08:00
Ma, Jing1	95a1952633	add SparseXPU to dispatch key set autogradother_backends (#70443 ) Summary: According to dispatch table computation logic, if no kernel register to a certain dispatch key, will use CompositeExplicitAutograd backend kernel, so we need add sparseXPU key to the alias key pool. Signed-off-by: Ma, Jing1 <jing1.ma@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/70443 Reviewed By: jbschlosser Differential Revision: D33406004 Pulled By: bdhirsh fbshipit-source-id: 009037739c818676901b10465632d3fef5ba14f2	2022-01-04 15:16:46 -08:00
francescocastelli	a60adc7f8a	fractional_max_pool2d_backward: port to structured kernel (#68245 ) Summary: Ported to structured kernel the fractional_max_pool2d_backward. Ref https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68245 Reviewed By: jbschlosser Differential Revision: D33405521 Pulled By: bdhirsh fbshipit-source-id: 4930e870d4025485317208df751bc3721ecdb7eb	2022-01-04 15:15:29 -08:00
Natalia Gimelshein	7e58b1dd7b	Sets device guard in _cudnn_impl functions (#70406 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70406 Reviewed By: mruberry Differential Revision: D33407972 Pulled By: ngimel fbshipit-source-id: 6bf97602ea13f8eaaff95d9f412a2eeaa0e6ba10	2022-01-04 15:11:17 -08:00
Jane Xu	6089a0f14a	Extend checkout for pytorch/builder (#70644 ) Summary: https://www.torch-ci.com/minihud shows 2 recent failures due to timing out. Increasing to 30m to see if it could be alleviated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70644 Reviewed By: suo, malfet, seemethere, atalman Differential Revision: D33413604 Pulled By: janeyx99 fbshipit-source-id: 756a7ad94c589e39b8567acbfc3e769dc0b9113f	2022-01-04 14:55:47 -08:00
Brian Hirsh	7b8c43cd7c	Revert "Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen" (#69951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69951 This reverts commit 0ef523633fddf2d63e97d5028b00af10ff344561. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33113543 Pulled By: bdhirsh fbshipit-source-id: b28073ee0870b413ea9f617f27671ae5c6f3c696	2022-01-04 14:53:21 -08:00
Brian Hirsh	bb5b4cceb6	Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950 This reverts commit f6cad53443704dfe5a20cc62bee14d91e3bffcaa. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33113545 Pulled By: bdhirsh fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288	2022-01-04 14:52:00 -08:00
Andrey Talman	d95be99561	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: malfet Differential Revision: D33409880 Pulled By: atalman fbshipit-source-id: 4f912339896096be95b344724a4d9ae88cdf1a8f	2022-01-04 14:41:27 -08:00
Jane Xu	4d08db0cb2	Flaky tests reporting: use GITHUB_RUN_ID instead of concatenated value (#70604 ) Summary: I did not realize the WORKFLOW_ID variable in our GHA scripts concatenated RUN_ID and RUN_NUMBER. For flaky tests collection, we should be only using RUN_ID, which makes it easier for us to write queries on the data Pull Request resolved: https://github.com/pytorch/pytorch/pull/70604 Reviewed By: suo Differential Revision: D33409503 Pulled By: janeyx99 fbshipit-source-id: 932405989dc1a406dfe9da9a7f513ca127c8d436	2022-01-04 14:36:13 -08:00
Michael Suo	0ece9a49d7	Revert D33198155: Bump version number to 7 and compile old operators with old schema Test Plan: revert-hammer Differential Revision: D33198155 (`d35fc409ad`) Original commit changeset: 38a1185f9ecb Original Phabricator Diff: D33198155 (`d35fc409ad`) fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d	2022-01-04 13:44:59 -08:00
Ilya Persky	61b562206b	Fix docstring for nn.ELU (#70574 ) Summary: Fixes nn.ELU's docstring problem reported at https://github.com/pytorch/pytorch/issues/70498. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70574 Reviewed By: VitalyFedyunin Differential Revision: D33404696 Pulled By: albanD fbshipit-source-id: 1ffcba3fdeadf88a4433e9168c42ddb252e833e9	2022-01-04 13:27:59 -08:00
Peter Bell	9cf0de509f	DispatchStub: Improve type mismatch errors (#67880 ) Summary: Currently when you register a kernel implementation to a dispatch stub, it takes the function signature from the function pointer you pass in. That means if you get the signature wrong, it fails at runtime with a link error instead of failing during the compilation. This also means that when registering nullptr you need to manually specify the type. Instead, taking the type from `DispatchStub::FnPtr` means quicker time to signal on failure and better error messages. The only downside is you need to actually include the DispatchStub declaration which for some CPU kernels was missing, so I've had to add them here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67880 Reviewed By: mrshenli Differential Revision: D33400922 Pulled By: ngimel fbshipit-source-id: 2da22f053ef82da5db512986e5b968d97a681617	2022-01-04 11:00:47 -08:00
linuxone	f64906f470	ibm z14/15 SIMD support (#66407 ) Summary: https://github.com/pytorch/pytorch/issues/66406 implemented z arch 14/15 vector SIMD additions. so far besides bfloat all other types have their SIMD implementation. it has 99% coverage and currently passing the local test. it is concise and the main SIMD file is only one header file it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much Sleef supports z15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407 Reviewed By: mrshenli Differential Revision: D33370163 Pulled By: malfet fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140	2022-01-04 09:40:18 -08:00
Kevin Tse	8dcfdf39e7	[DataPipe] Renaming FileLoader to FileOpener with deprecation warning for FileLoader (#70367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70367 This PR renames the `FileLoaderIterDataPipe` to `FileOpenerIterDataPipe`. For the sake of not breaking many CI tests immediately, it still preserves `FileLoader` as an alias. This will allow downstream libraries/users to migrate their use cases before we fully remove all references to `FileLoader` from PyTorch. Fixes https://github.com/pytorch/data/issues/103. More detailed discussion about this decision is also in the linked issue. cc VitalyFedyunin ejguan NivekT pmeier Nayef211 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33301648 Pulled By: NivekT fbshipit-source-id: 59278dcd44e372df0ba2001a4eecbf9792580d0b	2022-01-04 09:14:50 -08:00
Jason Ansel	7c7eb351c3	Populate __name__ for torch.nn.modules.utils.{_single,_pair,...} (#70459 ) Summary: This helps with debug printouts and python level graph analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70459 Reviewed By: wconstab Differential Revision: D33340032 Pulled By: jansel fbshipit-source-id: 24d3fdf31e9e5e92bb47f0db30339cf373a1d4d4	2022-01-04 08:37:12 -08:00
Akshit Khurana	1150046d29	NNAPI: Add runtime flexible shapes & return shapes (#70334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70334 * Use 0 for load time flexible shapes * -1 for runtime flexible shapes * NNAPI needs return shapes for flexible outputs Test Plan: Tested via upcoming ops Reviewed By: dreiss Differential Revision: D33237922 fbshipit-source-id: 50afdd8e3c6401dfb79b4bc09513c9882a09e5d5	2022-01-04 08:37:09 -08:00
Jane Xu	a825351c13	GHA Windows: Propagate exit code from .bat to calling bash script (#70011 ) Summary: The windows 1st shard was silently failing to run (more details here https://github.com/pytorch/pytorch/issues/70010) because the code to run them was never reached. It was silently failing because our CI still returned green for those workflow jobs, because the exit code from the batch script DID NOT PROPAGATE to the calling bash script. The key here is that even though we have ``` if ERRORLEVEL 1 exit \b 1 ``` The exit code 1 was NOT propagating back to the bash script, as the `exit \b 1` was within an `if` statement and the batch script was actually run in a cmd shell, so the bash script win-test.sh continued without erroring. Moving the `exit \b 1` to be standalone fixes it. More details for this can be found in this stack overflow https://stackoverflow.com/a/55290133 Evidence that now a failure in the .bat would fail the whole job: https://github.com/pytorch/pytorch/runs/4621483334?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70011 Reviewed By: seemethere, samdow Differential Revision: D33303020 Pulled By: janeyx99 fbshipit-source-id: 8920a43fc6c4b67fecf90f3fca3908c314522cd6	2022-01-04 08:35:49 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d35fc409ad	Bump version number to 7 and compile old operators with old schema (#68358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198155 Pulled By: tugsbayasgalan fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c	2022-01-04 01:31:25 -08:00
Akshit Khurana	d9106116aa	nnapi: Add int32 type torchscript expressions (#70197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70197 Test Plan: * `pytest test/test_nnapi.py` * Testing via ops following this commit Reviewed By: anshuljain1, dreiss Differential Revision: D33237917 fbshipit-source-id: f0493620f28a62ad9fe0b97b67d1e25059d50c24	2022-01-03 19:00:38 -08:00
Richard Barnes	1b66915f39	Have type_parser return const reference (#70477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70477 Test Plan: Sandcastle Reviewed By: cccclai Differential Revision: D33340030 fbshipit-source-id: b2a295b7c1c01e86971f6b9bbdd7d3718a2d3f0c	2022-01-03 16:18:28 -08:00
Ivan Krivyakov	bc3246453b	Added explicit build command for Windows and clarification on obtaining (#70190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70190 C++ build tools to readme.md Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33245438 Pulled By: ikriv fbshipit-source-id: ef863d68926bd7416d0e10d24197d19392c124de	2022-01-03 14:33:59 -08:00
Richard Barnes	1e67570f3a	Drop omp simd from batch_permutation_op.cc (#70579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70579 Fixes ``` 36 stderr: caffe2/caffe2/operators/batch_permutation_op.cc:25:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning] 3 caffe2/caffe2/operators/batch_permutation_op.cc:25:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning] ``` Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D33378925 fbshipit-source-id: 5ae3bfb8fadfa91a13ff0dcf5fae2ce7864ea90e	2022-01-03 08:45:50 -08:00
CodemodService FBSourceClangFormatLinterBot	ab49d41bb5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33393329 fbshipit-source-id: 728d47e62e8d81c5243c62917d88e54c4b4a1db2	2022-01-02 17:30:39 -08:00
Peter Bell	fa09099ba3	Codegen: TraceType only includes operators being registered (#68691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691 TraceType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also changes all the included autograd and jit headers from including `ATen/ATen.h` to just including `ATen/core/Tensor.h`. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D33336948 Pulled By: albanD fbshipit-source-id: 4e40371592b9a5a7e7fcd1d8cecae11ffb873113	2022-01-02 13:09:19 -08:00
Jerry Zhang	779f41a78a	[quant] Add a e2e test for standalone module + custom backend_config_dict (#70152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70152 This is to demonstrate our backend_config_dict works for one of our internal use cases Test Plan: python test/fx2trt/test_quant_trt.py Imported from OSS Reviewed By: vkuzo, raghuramank100 Differential Revision: D33205161 fbshipit-source-id: dca8570816baaf85a79f2be75378d46c3af0e454	2022-01-02 11:20:50 -08:00
Jerry Zhang	ce86881afa	[quant][graphmode][fx] Add qat module mapping support in backend_config_dict (#70287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70287 This PR adds the support to configuring qat modules for fused/non-fused modules TODO: there are some redundant configs, especially for fused op patterns, we can clean them up later Test Plan: python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps Imported from OSS Reviewed By: vkuzo Differential Revision: D33274057 fbshipit-source-id: b2e6a078211320d97c41ffadd3ecedfab57e3b77	2021-12-30 23:30:34 -08:00
Jerry Zhang	65faf1a7eb	[fx2trt] Add version check for ProfilingVerbosity bulider config (#70286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70286 att Test Plan: python test/fx2trt/test_quant_trt.py Imported from OSS Reviewed By: soulitzer Differential Revision: D33274058 fbshipit-source-id: c7657f9ba8b578d40d6fc1793b8b363898700eee	2021-12-30 19:59:25 -08:00
Salil Desai	6bc06ec3c2	[PyTorch Edge][QNNPack] Tighten Step Height for Indirection Buffers (#70530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70530 ```kernel_size + (output_width * step_width - 1) * kernel_height``` is more space than needed, and ```kernel_size + (output_width - 1) * step_width * kernel_height``` is just enough. Test Plan: Phabricator Tests Reviewed By: kimishpatel Differential Revision: D32553599 fbshipit-source-id: 30f6d191705bcb25dc9bb7a91c6d7b99c3a348e5	2021-12-30 14:57:33 -08:00
kshitij12345	7bfaa230be	[nn] adaptive_avg_pool{1/2/3}d : Error on negative `output_size` (#70488 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70488 Reviewed By: H-Huang Differential Revision: D33367289 Pulled By: jbschlosser fbshipit-source-id: 6b7b89d72c4e1e049ad6a0addb22a261c28ddb4c	2021-12-30 14:42:11 -08:00
Joel Schlosser	e6c3aa3880	Remove backward ops for mkldnn convolution (#70467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70467 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33342476 Pulled By: jbschlosser fbshipit-source-id: 9811d02b16adea0dd1dd2500261f4b3b294d2dee	2021-12-30 14:29:22 -08:00
Jerry Zhang	cfc71f56e4	[quant][fx][graphmode] Support standalone module in _convert_do_not_use (#70151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70151 this supports converting an observed standalone module to quantized standalone module in the new convert flow (convert observers to quant-dequant operators) Test Plan: ``` python test/test_quant_trt.py TestConvertFxDoNotUse ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33205163 fbshipit-source-id: 01ea44fb2a8ffe30bec1dd5678e7a72797bafafc	2021-12-30 12:31:03 -08:00
mingfeima	401a6b682b	add BFloat16 support for AdaptiveAvgPool2d on CPU (#56902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56902 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D28836789 Pulled By: VitalyFedyunin fbshipit-source-id: caac5e5b15190b8010bbfbc6920aa44032208ee7	2021-12-30 11:58:37 -08:00
Juhyeong Kim	bc40fb5639	[Reinstate] Wishart distribution (#70377 ) Summary: Implement https://github.com/pytorch/pytorch/issues/68050 Reopened merged and reverted PR https://github.com/pytorch/pytorch/issues/68588 worked with neerajprad cc neerajprad Sorry for the confusion. TODO: - [x] Unit Test - [x] Documentation - [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. Debug positive definite constraints https://github.com/pytorch/pytorch/issues/68720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70377 Reviewed By: mikaylagawarecki Differential Revision: D33355132 Pulled By: neerajprad fbshipit-source-id: e968c0d9a3061fb2855564b96074235e46a57b6c	2021-12-30 11:41:46 -08:00
epwalsh	14d3d29b16	make ProcessException pickleable (#70118 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70116 Happy to add tests if you let me know the best place to put them. cc VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/70118 Reviewed By: malfet Differential Revision: D33255899 Pulled By: ejguan fbshipit-source-id: 41d495374182eb28bb8bb421e890eca3bddc077b	2021-12-30 09:09:55 -08:00
Salil Desai	9c742bea59	[PyTorch Edge][QNNPack] Enable Depthwise Specific Conv3d Kernel for Kernel Size 3x3x3 (#69315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69315 Uses kernels and setup modifications from earlier diffs in this stack ghstack-source-id: 146346780 Test Plan: Correctness - Test using QNNPack Operator-Level Test: -- Neon Kernel: As in test plan of D32217846, all tests pass -- SSE2 Kernel: ```buck test xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack:pytorch_qnnpack_test```, all tests pass - Test by Printing Results of Model-Level Test: D32122020 Performance Operator Level tests from convolution.cc in D32217846 \|\|Before (V23 of D32217846, without newly added kernel)\|After (V48 of D31966574, with newly added kernel)\| \|depthwise 3x3x3 static\|184 ms\|134 ms\| \|depthwise 3x3x3 runtime\|181 ms\|134 ms\| \|depthwise 3x3x3s2 static\|30 ms\|22 ms\| \|depthwise 3x3x3s2 runtime\|30 ms\|23 ms\| \|depthwise 3x3x3s1x2 static\|97 ms\|70 ms\| \|depthwise 3x3x3s1x2 runtime\|96 ms\|70 ms\| \|depthwise 3x3x3s2x1 static\|53 ms\|38 ms\| \|depthwise 3x3x3s2x1 runtime\|53 ms\|38 ms\| \|depthwise 3x3x3d2 static\|104 ms\|74 ms\| \|depthwise 3x3x3d2 runtime\|103 ms\|75 ms\| \|depthwise 3x3x3d1x2 static\|158 ms\|116 ms\| \|depthwise 3x3x3d1x2 runtime\|157 ms\|115 ms\| \|depthwise 3x3x3d2x1 static\|120 ms\|86 ms\| \|depthwise 3x3x3d2x1 runtime\|120 ms\|87 ms\| \|depthwise 3x3x3 per channel static\|182 ms\|134 ms\| \|depthwise 3x3x3 per channel runtime\|184 ms\|134 ms\| \|depthwise 3x3x3s2 per channel static\|30 ms\|22 ms\| \|depthwise 3x3x3s2 per channel runtime\|31 ms\|23 ms\| \|depthwise 3x3x3s1x2 per channel static\|95 ms\|70 ms\| \|depthwise 3x3x3s1x2 per channel runtime\|95 ms\|71 ms\| \|depthwise 3x3x3s2x1 per channel static\|53 ms\|39 ms\| \|depthwise 3x3x3s2x1 per channel runtime\|55 ms\|39 ms\| \|depthwise 3x3x3d2 per channel static\|105 ms\|75 ms\| \|depthwise 3x3x3d2 per channel runtime\|103 ms\|75 ms\| \|depthwise 3x3x3d1x2 per channel static\|158 ms\|116 ms\| \|depthwise 3x3x3d1x2 per channel runtime\|158 ms\|116 ms\| \|depthwise 3x3x3d2x1 per channel static\|118 ms\|87 ms\| \|depthwise 3x3x3d2x1 per channel runtime\|119 ms\|87 ms\| Average Change: -36.96% (Generated with https://www.internalfb.com/intern/anp/view/?id=1371846&revision_id=291376782898627) Model Level Test on Synthesized Conv3d Model Model Details: - 21 channels, input size: 9 x 12 x 7, kernel size: 3x3x3 - Config added in D31928710 - Model generated with https://www.internalfb.com/intern/anp/view/?id=1313660&revision_id=248658657303993 ```buck run aibench:run_bench -- -b dw_conv_3d_3x3x3_big_2b.json --platform android/arm64 --framework pytorch --remote --devices Pixel-4a-11-30``` - Before (V23 of D32217846): [0.0935 ms](https://our.intern.facebook.com/intern/aibench/details/768298420366437) - After (V48 of D31966574): [0.0665 ms](https://our.intern.facebook.com/intern/aibench/details/67271954298132) (29% faster) * Model Level Test on Video Model-like Inputs (provided by liyilui) * - D33000199 - 87.5% faster Reviewed By: kimishpatel Differential Revision: D31966574 fbshipit-source-id: 6554a878401c1120054f6b02241456e8fb44b152	2021-12-30 08:12:10 -08:00
Salil Desai	3d4590d16f	[PyTorch Edge][QNNPack] Depthwise Conv3d mp8x27 (per-channel) Sse2 Kernel (#69314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69314 Implementation based off of [convolution-operator-tester.h](https://www.internalfb.com/code/fbsource/[679135d62c0a64e3d0fa0c830aa062ac28f292b8]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/test/convolution-operator-tester.h) Generated files (caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/wrappers/q8dwconv/*) made with - cd caffe2/aten/src/ATen/native/quantized/cpu/qnnpack - python3 generate-wrapper.py The math used the compute the ```w_zyxc_ptr``` is explained here: {F681213069} ghstack-source-id: 146346784 Test Plan: Test when used in depthwise conv3d later in this diff stack (D31966574) Reviewed By: kimishpatel Differential Revision: D32261231 fbshipit-source-id: 8e793696f7c3b0e7cceda88df8099f64f3c69ac4	2021-12-30 08:12:07 -08:00
Salil Desai	821c085c9b	[PyTorch Edge][QNNPack] Depthwise Conv3d mp8x27 (per channel) Neon Kernel (#69313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69313 Allows for depthwise conv3d with 3x3x3 kernel Implementation based heavily off of [mp8x25-neon-per-channel.c](https://www.internalfb.com/code/fbsource/[679135d62c0a64e3d0fa0c830aa062ac28f292b8]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8dwconv/mp8x25-neon-per-channel.c) (depthwise conv2d with 5x5 kernel) This supports per-channel convolution, but it works for non per-channel too Generated files (caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/wrappers/q8dwconv/*) made with - cd caffe2/aten/src/ATen/native/quantized/cpu/qnnpack - python3 generate-wrapper.py ghstack-source-id: 146346785 Test Plan: Test when used in depthwise conv3d later in this diff stack (D31966574) Reviewed By: kimishpatel Differential Revision: D32074096 fbshipit-source-id: 8111926df6ecb89d88ca810deeab87b1c072f55a	2021-12-30 08:12:04 -08:00
Salil Desai	15d443326c	[PyTorch Edge][QNNPack] Depthwise Conv3d Weight Packing (#69312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69312 Enable packing weights to be compatible with depthwise specific conv3d kernels ghstack-source-id: 146346778 Test Plan: - Existing 2d weight packing uses do not break (phabricator tests) - Test 3d weight packing when used in depthwise conv3d later in this diff stack (D31966574) Reviewed By: kimishpatel Differential Revision: D32045036 fbshipit-source-id: a2323f74f7d30d92d4ed91315f59539ecad729ec	2021-12-30 08:12:00 -08:00
Salil Desai	db37fd3865	[PyTorch Edge][QNNPack] Depthwise Conv3d Indirection Buffer Setup (#69311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69311 Enable setting up indirection buffer to be compatible with depthwise specific conv3d kernels ghstack-source-id: 146346788 Test Plan: - Existing 2d indirection buffer uses do not break (phabricator tests) - Test 3d indirection buffer when used in depthwise conv3d later in this diff stack (D31966574) Reviewed By: kimishpatel Differential Revision: D31999533 fbshipit-source-id: a403d8dcad6e50641b9235e0b574129b2dfb5412	2021-12-30 08:11:57 -08:00
Salil Desai	9863cd5741	[PyTorch Edge][QNNPack] Refactor Computing Step Dimensions (#69310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69310 Extract computing step height and step width into helper function and store them in operator struct since the same calculation is used in many places before this diff. ghstack-source-id: 146346783 Test Plan: Phabricator tests Reviewed By: kimishpatel Differential Revision: D32553327 fbshipit-source-id: e5bf07416f4c1ccde9975f835767392ad7a851c1	2021-12-30 08:11:54 -08:00
Salil Desai	cea3eba617	[PyTorch Edge][QNNPack] Operator-Level Conv3d Tests (#69309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69309 Test correctness of QNNPack Conv3d - Add Depth dimension to ConvolutionOperatorTester - Add tests which use it Includes John's changes in D32388572 ghstack-source-id: 146346786 Test Plan: Build the Test - ```cd caffe2/aten/src/ATen/native/quantized/cpu/qnnpack``` - ```./scripts/build-android-arm64.sh``` - Test binary is outputted to ```build/android/arm64-v8a``` Run the Test - ```test_name=convolution-test``` - ```chmod +x build/android/arm64-v8a/$test_name``` - Send the binary to android device and execute it, ex. connect to one world and ```adb push build/android/arm64-v8a/$test_name /data/local/tmp/$test_name``` then ```adb shell /data/local/tmp/$test_name``` Reviewed By: kimishpatel Differential Revision: D32217846 fbshipit-source-id: eba200c136894461bf76b2a5416540fe8781d588	2021-12-30 08:10:34 -08:00
Salil Desai	35251a5528	[PyTorch] Add Enum to IValue Deepcopy (#69937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69937 This enables ```export_torch_mobile_model``` compatibility with Enum IValues Test Plan: ModuleAPITest.DeepCopyEnum Reviewed By: gmagogsfm Differential Revision: D33104681 fbshipit-source-id: ca2a6d259c312487fe38dd1bed33ab6b7910bc2a	2021-12-30 07:52:22 -08:00
Kshiteej K	36db501736	softplus_backward: remove output arg (#70296 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69042 Tested with OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/70296 Reviewed By: mikaylagawarecki Differential Revision: D33349227 Pulled By: albanD fbshipit-source-id: edeb35cb19ab4434d39df93d4536cb07679218b5	2021-12-30 02:16:36 -08:00
Chen Lai	18dd5cdba5	[Operator Versioning][Test] Use hypothesis for better test input data and broader coverage (#70263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70263 Leverage the hypothesis library as it's more systematic way for testing. To write a test, it needs two parts: 1. A function that looks like a normal test in your test framework of choice but with some additional arguments 2. A given decorator that specifies how to provide those arguments. ghstack-source-id: 146344955 Test Plan: ``` buck test mode/opt //caffe2/test:jit python test/test_jit.py TestSaveLoadForOpVersion ``` Reviewed By: iseeyuan Differential Revision: D33244389 fbshipit-source-id: c93d23f3d9575ebcb4e927a8caee42f4c3a6939d	2021-12-29 20:43:32 -08:00
Jerry Zhang	c627211651	[quant][fx][graphmode][be] Change the type for output of convert to be torch.nn.Module (#69959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69959 GraphModule is an implementation detail, We don't want to expose it in quantization apis Test Plan: python test/test_quantization.py TestQuantizeFx.test_quantized_model_type Imported from OSS Reviewed By: supriyar Differential Revision: D33119103 fbshipit-source-id: d8736ff08b42ee009d6cfd74dcb3f6150f71f3d2	2021-12-29 20:33:32 -08:00
Mikayla Gawarecki	fb78a31916	Add testing across mem_formats to ModuleInfos (#69317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69317 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33285780 Pulled By: mikaylagawarecki fbshipit-source-id: 1d19293e640e5581351a9c74892dcac4bcdd3f1d	2021-12-29 14:53:27 -08:00
Mikayla Gawarecki	14f4b91f6e	Add Nondeterministic Tol to gradient test in test_modules (#69402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69402 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D33285781 Pulled By: mikaylagawarecki fbshipit-source-id: f1ab43173d4f558adc943a8acefc13c34cfa5cfa	2021-12-29 14:51:56 -08:00
vfdev	d2abf3f981	Added antialias flag to interpolate (CPU only, bicubic) (#68819 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bicubic mode - added tests Previous PR for bilinear, https://github.com/pytorch/pytorch/pull/65142 ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apples vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 1 [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 4.5 \| 5.2 channels_last non-contiguous torch.float32 \| 4.5 \| 5.3 Times are in milliseconds (ms). [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 5.7 \| 6.4 channels_last non-contiguous torch.float32 \| 5.7 \| 6.4 Times are in milliseconds (ms). [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) --------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.0 \| 4.0 channels_last non-contiguous torch.float32 \| 2.9 \| 4.1 Times are in milliseconds (ms). [------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 14.7 \| 17.1 channels_last non-contiguous torch.float32 \| 14.8 \| 17.2 Times are in milliseconds (ms). [------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.5 \| 3.9 channels_last non-contiguous torch.float32 \| 3.5 \| 3.9 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 2.4 \| 1.8 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.1 \| 2.2 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ----------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.6 \| 1.4 Times are in milliseconds (ms). [--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 7.9 \| 5.7 Times are in milliseconds (ms). [--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.7 \| 1.3 Times are in milliseconds (ms). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/3810 and https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68819 Reviewed By: mikaylagawarecki Differential Revision: D33339117 Pulled By: jbschlosser fbshipit-source-id: 6a0443bbba5439f52c7dbc1be819b75634cf67c4	2021-12-29 14:04:43 -08:00
Jim	2b00dbbbbc	fix typos in torch/csrc/deploy/README.md (#70494 ) Summary: Fixes typo in torch/csrc/deploy/README.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/70494 Reviewed By: mikaylagawarecki Differential Revision: D33354431 Pulled By: H-Huang fbshipit-source-id: b05757a795d2700eea21d7b881d87a7b239a8b52	2021-12-29 13:52:06 -08:00
George Qi	8af39b7668	AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200166 Pulled By: george-qi fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7	2021-12-29 10:25:26 -08:00
Ilya Persky	0460324b9b	Fix docs rendering for nn.Module.named_modules() (#70491 ) Summary: The documentation rendering for nn.Module.named_modules() is a bit broken, see the description of the last argument [here](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_modules). This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70491 Reviewed By: mikaylagawarecki Differential Revision: D33349882 Pulled By: albanD fbshipit-source-id: a46327c12e8114f7ef2055a8518c4ca9d186e669	2021-12-29 10:08:53 -08:00
Joel Schlosser	fb736c77a4	Remove backward op for slow dilated 3d convolution (#70068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70068 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33172550 Pulled By: jbschlosser fbshipit-source-id: 72109577c020b33e4b9807064f53f1989475d1c2	2021-12-29 09:46:19 -08:00
kshitij12345	2c67621a19	[rnn,gru,lstm]cell : no batch dim (#70236 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70236 Reviewed By: mikaylagawarecki Differential Revision: D33338774 Pulled By: jbschlosser fbshipit-source-id: 7d8d00272e543b3e67060136b5d98a4baefbedd5	2021-12-29 09:27:32 -08:00
CodemodService FBSourceClangFormatLinterBot	9266b2af73	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33347489 fbshipit-source-id: d43ce53c93724f44b587bfe892534f8d13eadaca	2021-12-29 04:06:52 -08:00
Chen Lai	103fc5f9a5	Remove unused variable (#70261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70261 ghstack-source-id: 146310591 Test Plan: ``` buck test fbsource//xplat/caffe2:for_each_prod_ptl_model_test ``` {gif:p014gzft} Reviewed By: iseeyuan Differential Revision: D33265656 fbshipit-source-id: 6e303ee304064a61383ba2ae34f2e21077ec9db3	2021-12-28 22:21:29 -08:00
Andrey Talman	066c9ff08f	Deprecating python 3.6 (#70325 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70457 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70325 Reviewed By: seemethere Differential Revision: D33339496 Pulled By: atalman fbshipit-source-id: 7509cab4f7469dae234bcf3f79e0aabb54577b8a	2021-12-28 18:44:59 -08:00
Chen Lai	a0c99a8d3b	[Operator Verioning][Edge] Update upgrader codegen with latest change (#70293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70293 ``` python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` https://github.com/pytorch/pytorch/pull/70161 is landed to resolve a thread safety issue. Accordingly, the upgrader codegen needs to be updated. ghstack-source-id: 146296324 Test Plan: ``` buck test mode/opt //caffe2/test:upgrader_codegen buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` Reviewed By: iseeyuan Differential Revision: D33274831 fbshipit-source-id: 0e1d2a81edc9b6111f3c6127dbd5b97e16c93dca	2021-12-28 18:34:31 -08:00
Joel Schlosser	a6eadf9b50	Remove backward op for slow 3d convolution (#69978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69978 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33131003 Pulled By: jbschlosser fbshipit-source-id: 097440b2eb501c1eeeb8a666d4bc3508fc5d0cfa	2021-12-28 16:19:23 -08:00
Eli Uriegas	5e113eb24d	.github: Add linux.4xlarge executor (#70474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70474 Needed to compile linux wheels for CUDA 11.x since we were OOM'ing with 16GB of RAM Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D33343322 Pulled By: seemethere fbshipit-source-id: 9f62e07ce2ca229fa25285429c01dc074d63b388	2021-12-28 15:40:28 -08:00
Lin Dong	0fb73035f7	[Bootcamp Task] Replace string concatenation by fmt::format (#70366 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69979 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70366 Reviewed By: H-Huang Differential Revision: D33339291 Pulled By: LynneD fbshipit-source-id: e4e0535cd2db8e9fa8b0875d17a900be58384367	2021-12-28 14:15:21 -08:00
Joel Schlosser	e96dda15e5	Remove backward op for slow 2d transposed convolution (#70333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70333 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33301402 Pulled By: jbschlosser fbshipit-source-id: 3cfb3165589fe1620f22479b05139676d20dc493	2021-12-28 12:38:59 -08:00
Joel Schlosser	c732a26e59	Add macro to register CPU kernel for all arch types (#70332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70332 Idea to avoid recompilations: what if we introduce a new macro REGISTER_ALL_CPU_DISPATCH that registers the same kernel across all CPU arch types? We'd call this from native/Convolution*.cpp and wouldn't need to move any logic underneath the native/cpu dir. That would simplify these PRs quite a bit and would also avoid the recompilation. Wdyt about this approach? Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33301403 Pulled By: jbschlosser fbshipit-source-id: d7cc163d4fe23c35c93e512d1f0a8af8c9897933	2021-12-28 12:37:36 -08:00
Eli Uriegas	244730eeea	.github: Add needs build for generate-test-matrix (#70456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70456 This job was still running on workflows despite ciflow not being enabled This makes it so that test matrix generation only occurs before tests are actually run. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D33338946 Pulled By: seemethere fbshipit-source-id: 4b83d5fe6572771807708764609a72c4f1c5745d	2021-12-28 10:11:34 -08:00
s-kumano	4ed02748be	fix typo in the docs of multiprocessing (#70448 ) Summary: Fix typo in the docs of multiprocessing. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70448 Reviewed By: gchanan Differential Revision: D33336962 Pulled By: H-Huang fbshipit-source-id: 1235703b8ddc26c33dcbc34bd25ac36b11a18923	2021-12-28 09:58:47 -08:00
srijan789	73b5b6792f	Adds reduction args to signature of F.multilabel_soft_margin_loss docs (#70420 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70420 Reviewed By: gchanan Differential Revision: D33336924 Pulled By: jbschlosser fbshipit-source-id: 18189611b3fc1738900312efe521884bced42666	2021-12-28 09:48:05 -08:00
Eli Uriegas	6f83841582	.github: Temporarily disable xla test config (#70453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70453 Removes the current xla config, downstream `pytorch/xla` is broken for clang compilation so temporarily removing this config until the xla team can fix this upstream CI. Context: https://github.com/pytorch/xla/pull/3255/files#r775980035 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zengk95 Differential Revision: D33338463 Pulled By: seemethere fbshipit-source-id: 1ef332c685d5e2cc7e2eb038e93bd656847fd099	2021-12-28 08:49:01 -08:00
Adnios	15f14ce0dc	fix typo in adam docs (#70387 ) Summary: Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam) ![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387 Reviewed By: H-Huang Differential Revision: D33309283 Pulled By: albanD fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b	2021-12-28 07:35:39 -08:00
Vasiliy Kuznetsov	574dbb584d	quant tests: fix log spew for HistogramObserver (#70107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107 Histogram observer used floor division on tensors, which is a deprecated behavior. There was a warning printed: ``` /Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo or'). ``` This PR fixes the warning. Test Plan: ``` python test/test_quantization.py TestHistogramObserver ``` Reviewed By: ejguan Differential Revision: D33187926 Pulled By: vkuzo fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753	2021-12-28 06:27:51 -08:00
Vasiliy Kuznetsov	00df885d4e	quant tests: clean up logs about incorrect tensor copy (#70106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70106 Some of quantization tests had log spew like ``` UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). ``` This PR cleans up the root cause from the utils. Some other tests may still hit this warning from other places Test Plan: ``` python test/test_quantization.py TestFakeQuantizeOps ``` this particular warning no longer appears Reviewed By: soulitzer Differential Revision: D33187925 Pulled By: vkuzo fbshipit-source-id: bd1acd77fd72a10dad0c254f9f9f32e513c8a89a	2021-12-28 06:26:40 -08:00
Michael Suo	b7b32b56f1	Revert D33281300: Prevent sum overflow in broadcast_object_list Test Plan: revert-hammer Differential Revision: D33281300 (`807f9a828c`) Original commit changeset: 1bc83e8624ed Original Phabricator Diff: D33281300 (`807f9a828c`) fbshipit-source-id: beb81a9cbfba405a61b11dfaa8e39c9601f45643	2021-12-27 19:01:53 -08:00
Stephan Uphoff	807f9a828c	Prevent sum overflow in broadcast_object_list (#70336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70336 broadcast_object_list casted the sum of all object lengths to int from long causing overflows. Test Plan: Increased size of Tensor used in object transfers to have >2GB storage requirement (in distributed_test.py) Without fix the length will overflow and the program will request a negative sized Tensor: ``` RuntimeError: Trying to create tensor with negative dimension -2147482417: [-2147482417] ``` With fix it will pass the test. Test used on server with GPUs: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --local -- broadcast_object Differential Revision: D33281300 fbshipit-source-id: 1bc83e8624edc14e747eeced7bc8a7a10e443ee4	2021-12-27 16:17:53 -08:00
Facebook Community Bot	5a9ea9e386	Automated submodule update: tensorpipe (#70438 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `52791a2fd2` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70438 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: zertosh Differential Revision: D33331758 fbshipit-source-id: 1e811ddc30e9afa440523c6cb5c4e893eb560978	2021-12-27 15:19:21 -08:00
Bo Wu	bf610f08b0	Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions" Summary: as title Test Plan: ``` buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform ... ############## Start inline_cvr_post_imp_model Test Results Analysis ############## I1226 22:03:56.789000 3346280 test_driver.py:139 UNKNOWN ] Test finished in 808.2743511786684 seconds. +-------------------------+---------+------------------------+-----------------+ \| Test Case \| Status \| Message \| Model Entity ID \| +-------------------------+---------+------------------------+-----------------+ \| SmallWorld_release_test \| Success \| finished successfully. \| 987987491 \| +-------------------------+---------+------------------------+-----------------+ I1226 22:03:56.790000 3346280 test_driver.py:143 UNKNOWN ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework I1226 22:03:56.792000 3346280 test_driver.py:160 UNKNOWN ] Calling cleanup I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385 UNKNOWN ] Stopping launched jobs 1 I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager ``` Reviewed By: seemethere Differential Revision: D33325936 fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e	2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	63e58d262a	Extend Graph, CompilationUnit, and schema matching to accept optional operator version number (#69914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69914 Test Plan: Imported from OSS Reviewed By: qihqi Differential Revision: D33198157 fbshipit-source-id: b98d9401e515f695d6cf99116f695edc7976bf01	2021-12-25 00:35:33 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	df3cbcff28	Add utility methods to find an upgrader (#68355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198156 Pulled By: tugsbayasgalan fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12	2021-12-24 12:23:04 -08:00
Shunting Zhang	911d527b87	Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339 When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message. Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName . Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want. Code under scripts/shunting are just my own experimental code. I can split them out if requested. ghstack-source-id: 146221879 Test Plan: buck test mode/opt //caffe2/test:jit Reviewed By: gmagogsfm Differential Revision: D33282878 fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d	2021-12-24 00:25:40 -08:00
Priya Ramani	ab4f9862a3	[Compiled Mobilenetv3 Demo] Integrate Compiled Mobilenetv3 into FB4A Playground app (#70370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70370 Demo of Mobilenetv3 compiled with NNC in FB4A Playground app: - Add compiled ModelConfig in FB4A app - Enable Camera inputs for Mobilenet processor in the app and add ability to show live outputs - Use downscaled inputs, which works for both original mobilenetv3 model and the compiled model - Update nnc_aten_adaptive_avg_pool2d to use adaptive_avg_pool2d instead of adaptive_avg_pool2d_out as the latter is not included in the traced operators of mobilenetv3 model and hence not included in the app. - Update app dependencies to include nnc_backend_lib and asm binary Test Plan: Run `arc playground pytorchscenario` from fbandroid to build and install the app on a connected device. Live demo with compiled Mobilenetv3 model: https://pxl.cl/1W1kb Reviewed By: larryliu0820 Differential Revision: D33301477 fbshipit-source-id: 5d50a0e70a7f7d2157d311d6b1feef46e78e85b6	2021-12-23 23:46:20 -08:00
Mikhail Zolotukhin	0ee663d2fa	Revert D33234529: [NNC Testing] Randomized loop nest infrastructure Test Plan: revert-hammer Differential Revision: D33234529 (`1d094587ea`) Original commit changeset: 9019f1f1d4ca Original Phabricator Diff: D33234529 (`1d094587ea`) fbshipit-source-id: a79deca9f186299bf884587eb7d50af2464979fb	2021-12-23 23:11:23 -08:00
jjsjann123	e429a68478	Allow single node fusion for nvfuser (#70000 ) Summary: Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000 Reviewed By: samdow Differential Revision: D33292195 Pulled By: davidberard98 fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756	2021-12-23 17:07:57 -08:00
soulitzer	5ccf28d066	Do not use ZeroTensor for inplace ops (#69998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69998 Fixes: https://github.com/pytorch/pytorch/issues/69855 The check for undefined grads for forward AD was not being run because `check_undefined_grads` was only passed as True by OpInfo for backward AD. This PR updates gradcheck to interpret `check_undefined_grads` as possibly for forward or backward AD. This PR also updates codegen to 1) not use ZeroTensor for `self` when the op is inplace. 2) only create zeros (either through ZeroTensor or at::zeros) if the tensor itself is not undefined. Previously we would error in this case when we call `.options` on the undefined tensor. ~TODO: undo the skips that are due to the original issue~ Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33235973 Pulled By: soulitzer fbshipit-source-id: 5769b6d6ca123b2bed31dc2bc6bc8e4701581891	2021-12-23 15:52:34 -08:00
soulitzer	3116d87024	Add forward AD formulas for `{adaptive_,fractional_,}max_pool{2,3}d_{backward,}` (#69884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69884 Also fixes: https://github.com/pytorch/pytorch/issues/69322, https://github.com/pytorch/pytorch/issues/69325 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33093039 Pulled By: soulitzer fbshipit-source-id: b9a522a00f4e9e85974888de5058de07280f8f66	2021-12-23 15:51:09 -08:00
Jordan Fix	6925576e88	[acc_ops] No longer mark acc_ops.cat as unary (#70365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70365 We should only mark ops as unary if they should have a single fx.Node input. However, `cat` has a sequence of `tensors` input. Reviewed By: alexbeloi Differential Revision: D33299988 fbshipit-source-id: db3581eaee4ad9d2358eed01ec9027825f58f220	2021-12-23 15:09:03 -08:00
Jane Xu	133c7f2cf9	Revert D33301254: [pytorch][PR] GHA Windows: Propagate exit code from .bat to calling bash script Test Plan: revert-hammer Differential Revision: D33301254 (`6431ac6c7a`) Original commit changeset: 6861dbf0f0a3 Original Phabricator Diff: D33301254 (`6431ac6c7a`) fbshipit-source-id: c9d8f72bb198de678456e0a1bcf3264c2ea52874	2021-12-23 15:03:48 -08:00
Jane Xu	6431ac6c7a	GHA Windows: Propagate exit code from .bat to calling bash script (#70011 ) Summary: The windows 1st shard was silently failing to run (more details here https://github.com/pytorch/pytorch/issues/70010) because the code to run them was never reached. It was silently failing because our CI still returned green for those workflow jobs, because the exit code from the batch script DID NOT PROPAGATE to the calling bash script. The key here is that even though we have ``` if ERRORLEVEL 1 exit \b 1 ``` The exit code 1 was NOT propagating back to the bash script, as the `exit \b 1` was within an `if` statement and the batch script was actually run in a cmd shell, so the bash script win-test.sh continued without erroring. Moving the `exit \b 1` to be standalone fixes it. More details for this can be found in this stack overflow https://stackoverflow.com/a/55290133 Evidence that now a failure in the .bat would fail the whole job: https://github.com/pytorch/pytorch/runs/4621483334?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70011 Reviewed By: malfet Differential Revision: D33301254 Pulled By: janeyx99 fbshipit-source-id: 6861dbf0f0a34d5baed59f928e34eab15af6f461	2021-12-23 14:09:41 -08:00
Jiewen Tan	ab57f6d12c	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: samdow Differential Revision: D33293160 Pulled By: alanwaketan fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09	2021-12-23 12:42:03 -08:00
Rohan Varma	16e6e1a59e	[Easy] Lint wrap.py file (#70341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70341 Per title ghstack-source-id: 146181936 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33290099 fbshipit-source-id: e4415a42086d9b1b78b0b5f42d4b02f275131dfa	2021-12-23 11:30:36 -08:00
Rohan Varma	3c231e9bd7	[FSDP] Remove module.wrapper_config support (#70340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70340 Some wrap APIs support module.wrapper_config to specify the FSDP arguments, though this feature is currently unused in all use cases and there is no plan to support this API. enable_wrap() and wrap() along with FSDP constructor wrapping should be enough for all use cases, so get rid of the unnecessary code. ghstack-source-id: 146181819 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33290066 fbshipit-source-id: e7f3d8b2f2ff6bdf4a3e5021dbb53adf052ee8dc	2021-12-23 11:29:13 -08:00
Sameer Deshmukh	d100d98db8	`torch.linalg` routines return `torch.linalg.LinAlgError` when a numerical error in the computation is found. (#68571 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/64785 by introducing a `torch.LinAlgError` for reporting errors caused by bad values in linear algebra routines which should allow users to easily catch errors caused by numerical errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68571 Reviewed By: malfet Differential Revision: D33254087 Pulled By: albanD fbshipit-source-id: 94b59000fdb6a9765e397158e526d1f815f18f0f	2021-12-23 10:53:26 -08:00
Mike Iovine	6a84449290	[SR] Fast path for VarStack on scalars (#70210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70210 Add a fast-path for `VarStack` nodes for when the inputs are scalars. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarStack` Reviewed By: hlu1 Differential Revision: D33177498 fbshipit-source-id: 922ab76a6808fbfdb8eb6091163a380344e38de6	2021-12-23 10:31:17 -08:00
kshitij12345	cc8b916395	Transformer{DecoderLayer} : no batch dim (#70322 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60585 TransformerDecoder Test Timings (takes about 30s) <details> ``` pytest test/test_modules.py -k _TransformerDeco --durations=10 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 639 items / 591 deselected / 48 selected test/test_modules.py ss......ss......ss..ssssssssss.................. [100%] ================================================================================================================================================================================ slowest 10 durations ============================================================================================== 17.13s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64 4.13s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_TransformerDecoderLayer_cpu_float64 1.22s call test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerDecoderLayer_cuda_float64 0.86s call test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cpu_float32 0.73s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float32 0.57s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32 0.56s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float64 0.48s call test/test_modules.py::TestModuleCPU::test_grad_nn_TransformerDecoderLayer_cpu_float64 0.41s call test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerDecoderLayer_cuda_float32 0.40s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float64 ============================================================================================ short test summary info ============================================================================================= ========================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 29.62s =========================================================================== ``` </details> Transformer Test Timings (takes about 1m10s) <details> ``` pytest test/test_modules.py -k _Transformer_ --durations=10 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 639 items / 591 deselected / 48 selected test/test_modules.py ss......ss......ss..ssssssssss.................. [100%] ================================================================================== ============================================================================================== slowest 10 durations ============================================================================================== 46.40s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Transformer_cuda_float64 11.09s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_Transformer_cpu_float64 2.48s call test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64 1.03s call test/test_modules.py::TestModuleCPU::test_grad_nn_Transformer_cpu_float64 0.96s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32 0.87s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float32 0.85s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float64 0.85s call test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_Transformer_cpu_float32 0.65s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64 0.47s call test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Transformer_cuda_float32 ============================================================================================ short test summary info ============================================================================================= ===================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 70.19s (0:01:10) ====================================================================== ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/70322 Reviewed By: cpuhrsch Differential Revision: D33286285 Pulled By: jbschlosser fbshipit-source-id: 46e08cf47f37787733a535f683c3fd21f652486d	2021-12-23 10:13:31 -08:00
George Qi	4d49af863f	GaussianNLLLoss no_batch_dim docs and testing (#69783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69783 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200486 Pulled By: george-qi fbshipit-source-id: a2bc2b366772682825f879dae4ac29c1f4d6a5f1	2021-12-23 09:27:53 -08:00
Adnios	a9c7d626e1	Add the `maximize` flag to AdamW (#70146 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/68052 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146 Reviewed By: malfet Differential Revision: D33254561 Pulled By: albanD fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421	2021-12-23 09:20:29 -08:00
Yanli Zhao	b15212c62b	enable backward pass computation and communication overlap by prefetching all gather (#70235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70235 address comments in https://github.com/pytorch/pytorch/pull/69282: Have fixed a few corner cases for prefetching full parameters in post backward hook. After benchmarking, prefetching full parameters in the pre-backward hook has the best performance and stable but at cost of increased memory; prefetching full parameters in the post-backward hook did not see expected performance, also failed in a few corner cases (fixed) although there is no memory increase. The main issue is that post backward hook fire order is not consistent with opposite of forward computation order, so incorrectly prefetched all gather could delay the really needed all gather in the single NCCL stream and cause some layer's computation delay. So putting these two algorithms as two configurable experimental algorithms for now prefetch full parameters at pre-backward hook: It is observed from past traces that all gather ops are not triggered until current layer's backward pass starts to compute, also for some models previous layers' reduce scatter is scheduled before next layer's all gather ops, since all gather and reduce scatter are in the same nccl stream, this case could result in backward pass has no communication and computation overlap. To explicitly make next layers' all gather scheduled while previous layers' backward computation is running, we can prefetch next layers' all gather full params. This can help 1) both all gather and reduce scatter are overlapped with computation deterministically 2) only prefetch one layer's all gather full parameters, to avoid increasing too much memories. The implementation borrowed the idea from facebookresearch/fairscale#865, where forward graph order is recorded in the forward pass. In the backward pass, this PR prefetches all gather full parameter in current layer's pre-backward hook, instead of prefetching in current layer's post backward hook in facebookresearch/fairscale#865. Also make sure all gather streams are synced properly. Experiments showed 10% memory increase and 20% latency speed up for 1GB roberta model in a slow network environment. Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D33252795 fbshipit-source-id: 4e2f47225ba223e7429b0dcaa89df3634bb70050	2021-12-22 23:02:46 -08:00
Animesh Jain	1d094587ea	[NNC Testing] Randomized loop nest infrastructure (#70174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70174 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33234529 fbshipit-source-id: 9019f1f1d4ca945c92bee401f7ec674b7d987de4	2021-12-22 22:07:39 -08:00
Jerry Zhang	656d2a7bf6	[quant][fx][graphmode] Add backend_config_dict for standalone module (#70150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70150 This PR allows user to specify backend_config_dict for standalone modules, both in prepare and convert step adding this now to allow prototype for some of our customer use cases, test for the codepath will be added in a separate PR Test Plan: regression tests ``` python test/test_quantization.py TestQuantizeFx ``` test that specifies backend_config for some module will be added in a separate PR for the use case we have in mind since it requires other features Imported from OSS Static Docs Preview: classyvision \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33205162/V9/classyvision/)\| \|Modified Pages\| Reviewed By: vkuzo Differential Revision: D33205162 fbshipit-source-id: a657cef8e49d99b6a43653141521dc87c33bfd89	2021-12-22 21:18:39 -08:00
Michael Suo	795af1578c	Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor Test Plan: revert-hammer Differential Revision: D33172665 (`121d067999`) Original commit changeset: b334ee358ea7 Original Phabricator Diff: D33172665 (`121d067999`) fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa	2021-12-22 21:12:49 -08:00
kshitij12345	12afe2bb84	update poisson_nll_loss opinfo samples (#70300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67461 cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70300 Reviewed By: cpuhrsch Differential Revision: D33285896 Pulled By: jbschlosser fbshipit-source-id: ec917ec7d3113dbc4ae03978fa5abb24aa082c01	2021-12-22 19:10:57 -08:00
Taylor Robie	681e78bace	[Profiler] Address issues from profiler bifurcation. (#70327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70327 After D32678163 (`7ea86dfdb1`), test_rpc_profiler began failing. This was surprising, because it should have been a no-op refactor. However, one change is that a Kineto profiler is no longer also an autograd profiler; the RPC framework was assuming a legacy profiler but when a kineto profiler was active things still kind of worked due to that implementation detail. (But crashed after the class split.) This diff tidys up a couple of things: 1) Move `getProfilerConfig` into `api.cpp`, since it is no longer correct to static_cast a `KinetoThreadLocalState` to a `ProfilerLegacyThreadLocalState`. (And really the class we want is `ProfilerThreadLocalStateBase` anyway.) 2) Add a mechanism for callers to check if the active profiler is a legacy or kineto profiler. (So callers like RPC can adjust or provide a nice error message.) 3) Fix the RPC test to create a legacy profiler. Test Plan: `caffe2/torch/fb/training_toolkit/backend/tests:test_rpc_profiler` now passes, and before the fix to `test_rpc_profiler.py`, I verified that the test failed with the error message added to `utils.cpp` rather than just crashing. Reviewed By: suphoff Differential Revision: D33283314 fbshipit-source-id: e4fc5b5cfc9ca3b91b8f5e09adea36f38611f90d	2021-12-22 18:50:42 -08:00
Jiewen Tan	121d067999	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: wconstab Differential Revision: D33172665 Pulled By: alanwaketan fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5	2021-12-22 18:15:45 -08:00
Nikita Shulga	bd8e8e3aaf	[GHA] Clean after checkout (#70337 ) Summary: Github's checkout action sometimes leaves untracked files in the repo Remedy it by running `git clean -fxd`, which should nuke them all Tentative fix for https://github.com/pytorch/pytorch/issues/70097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70337 Reviewed By: suo Differential Revision: D33289189 Pulled By: malfet fbshipit-source-id: 16e3ebe7a61fda1648189c78bdf1b1185247037a	2021-12-22 18:10:23 -08:00
kshitij12345	a421ee0e52	[nn] InstanceNorm : no batch dim for modules (#65323 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65323 Reviewed By: davidberard98 Differential Revision: D33285268 Pulled By: jbschlosser fbshipit-source-id: c5210bb431eaf27190e1cd75c42af3e5bcf83f72	2021-12-22 18:00:36 -08:00
Michael Suo	c06b3208d4	Revert D33141012: test //c10/... in CI Test Plan: revert-hammer Differential Revision: D33141012 (`0ccccf4ed5`) Original commit changeset: 702000587171 Original Phabricator Diff: D33141012 (`0ccccf4ed5`) fbshipit-source-id: 1e30c2dad940f54185dc93912fd7b3e81eec5b63	2021-12-22 17:48:48 -08:00
Michael Suo	23ab6ce723	Revert D33141011: extract //c10/macros into its own package Test Plan: revert-hammer Differential Revision: D33141011 (`8f4c724bb6`) Original commit changeset: caa97448f922 Original Phabricator Diff: D33141011 (`8f4c724bb6`) fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4	2021-12-22 17:48:45 -08:00
Michael Suo	f126501d37	Revert D33141010: allow Bazel to build without glog and gflags Test Plan: revert-hammer Differential Revision: D33141010 (`8c41f258f4`) Original commit changeset: d951e5616459 Original Phabricator Diff: D33141010 (`8c41f258f4`) fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae	2021-12-22 17:47:23 -08:00
Mike Iovine	682fab19d4	[SR] verify_and_correct_memory_overlap handles tensor lists (#69774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69774 We recently ran into a nasty bug caused by incorrect schema annotations on an `aten::split` overload. `verify_and_correct_memory_overlap` is supposed to prevent crashes in this scenario, but it didn't because it did not handle `Tensor[]` outputs. This change extends the memory correction mechanism to handle tensor lists. ghstack-source-id: 146152478 Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D33022494 fbshipit-source-id: 8d1d41ca1d4fd5dfb7c8a66028c391ba63551eb0	2021-12-22 17:18:18 -08:00
Jiewen Tan	385c12852e	[LTC] Upstream LazyTensor <=> at::Tensor utils (#70066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70066 This commit upstreams utils to convert at::Tensors into LazyTensors and vice versa. Test Plan: Covered by test_ptltc on the lazy_tensor_staging branch since TorchScript Backend hasn't merged yet. Reviewed By: desertfire Differential Revision: D33171590 Pulled By: alanwaketan fbshipit-source-id: b297ff5fc8ca1a02d30e16ad2249985310e836a9	2021-12-22 16:53:07 -08:00
Joel Schlosser	2e94a0d282	Remove backward ops for NNPACK spatial convolution (#70305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70305 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33279223 Pulled By: jbschlosser fbshipit-source-id: f263012b3edaa87ce5430ffd6204a5453360d5dd	2021-12-22 14:58:12 -08:00
Peter Bell	7cdfd86a72	TestMathBits: test with neg and conj bit set (#68948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68948 The case where both the negative and conjugate bits are set isn't tested currently despite being handled explicitly by `copy`. In theory this shouldn't matter because neg_bit is only used for real values, but it does mean the code in copy is untested. So, this just runs it with a single sample as a sanity check. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064371 Pulled By: anjali411 fbshipit-source-id: e90c65e311507c4fc618ff74fecc4929599c4fa3	2021-12-22 14:30:35 -08:00
George Qi	7c690ef1c2	FractionalMaxPool3d with no_batch_dim support (#69732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69732 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33280090 Pulled By: george-qi fbshipit-source-id: aaf90a372b6d80da0554bad28d56436676f9cb89	2021-12-22 14:30:32 -08:00
Michael Dagitses	8c41f258f4	allow Bazel to build without glog and gflags (#69995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995 ghstack-source-id: 146027060 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141010 fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8	2021-12-22 14:30:30 -08:00
Michael Dagitses	8f4c724bb6	extract //c10/macros into its own package (#69994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994 ghstack-source-id: 145799968 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141011 fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b	2021-12-22 14:30:27 -08:00
Michael Dagitses	0ccccf4ed5	test //c10/... in CI (#69993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69993 ghstack-source-id: 145799967 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141012 fbshipit-source-id: 70200058717189a57858f3f8d94ecc364fb229d6	2021-12-22 14:30:24 -08:00
Rui Zhu	1bd147b61a	Fix masked_softmax's perf for element_size is not 8 (#70271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70271 Test Plan: Rebase on top of D32407544 and buck run mode/opt -c fbcode.enable_gpu_sections=true pytext/fb/tools:benchmark_masked_softmax -- masked-softmax --batch-size=10 to see correct perf data ( PT time = ~2.5x PT native time ) Reviewed By: ngimel Differential Revision: D33268055 fbshipit-source-id: f48b17852c19c2bc646f9ed8d9d5aac85caa8a05	2021-12-22 14:29:09 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Kimish Patel	b37de0a4bb	Update flags in nnc lowering (#70306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70306 USE_XNNPACK is the right one to enable lowering to prepacked xnnpack based ops Test Plan: CI Reviewed By: ZolotukhinM, priyaramani Differential Revision: D33279375 fbshipit-source-id: d19ded5643f487f7b58c54a860ad39c8d484ed05	2021-12-22 12:25:35 -08:00
zengk95	f36b44bb9e	Remove ciflow_should_run job (#70204 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66725 This removes the ci_flow_should_run job and puts it in the build stage for the different job templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70204 Reviewed By: malfet Differential Revision: D33282338 Pulled By: zengk95 fbshipit-source-id: 327ff2bca9720d2a69083594ada5c7788b65adbd	2021-12-22 11:52:42 -08:00
Pascal	276253b164	Fixed wrong return type in ModuleList getitem (#69083 ) Summary: Fixes typing error: `Expected type ‘Iterable’ (matched generic type ‘Iterable[_T1]’), got ‘Module’ instead. ` see: https://discuss.pytorch.org/t/modulelist-typing-error-not-an-iterable/138137/5 : To reproduce (e.g. with mypy/pycharm): ```python import torch.nn as nn class Model(nn.Module): def __init__(self): super().__init__() self.module_list = nn.ModuleList( [nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 1)] ) def forward(self, batch): for i in self.module_list[1:4]: pass return batch model = Model() out = model(torch.randn(1, 1)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69083 Reviewed By: davidberard98 Differential Revision: D33279114 Pulled By: jbschlosser fbshipit-source-id: 90d74e76602163586b6ff4c49613a2694a9af37c	2021-12-22 11:38:17 -08:00
vfdev-5	ce9a2f8ba9	[C++ API] Added missing nearest-exact mode and anti-alias flag (#69318 ) Summary: Description: Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/65142 - https://github.com/pytorch/pytorch/pull/64501 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318 Reviewed By: davidberard98 Differential Revision: D33278995 Pulled By: jbschlosser fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7	2021-12-22 11:10:51 -08:00
PepeLotudo	da63f3f92b	Corrected typo in Cross entropy formula (#70220 ) Summary: Changes made to line 1073: The denominator of the formula was the EXP(SUM(x)) and changed it to SUM(EXP(x)) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/70220 Reviewed By: davidberard98 Differential Revision: D33279050 Pulled By: jbschlosser fbshipit-source-id: 3e13aff5879240770e0cf2e047e7ef077784eb9c	2021-12-22 11:06:21 -08:00
Jerry Zhang	b7259b8660	[quant][be] Add a check in prepare_qat to make sure the model is in training mode (#69879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69879 att Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33080989 fbshipit-source-id: 55a631284365ec9dfd6bd7469688490ab1891d41	2021-12-22 11:00:00 -08:00
Drazen Borkovic	2806d821b0	Add conversion of torch.permute to acc_ops.permute (#70294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70294 In order to inference shape for permute, the node target needs to get converted from torch.permute to acc_opts.permute. Reviewed By: jfix71 Differential Revision: D33267469 fbshipit-source-id: b77eff1892211eac4a798a2f3e624140e287f4a2	2021-12-22 10:38:39 -08:00
Natalia Gimelshein	56969bf88a	make inverse call linalg_inv (#70276 ) Summary: `linalg.inv` and `inverse` are aliases according to documentation, yet their implementation is somewhat diverged. This makes `inverse` call into `linalg_inv`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70276 Reviewed By: malfet Differential Revision: D33271847 Pulled By: ngimel fbshipit-source-id: cf018ddd2c1cee29026dd5f546f03f3a1d3cf362	2021-12-22 10:15:40 -08:00
kshitij12345	4db3a8fc0a	[nn] TransformerEncoderLayer: no-batch-dim (#69291 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO: * [ ] Update docs? * [x] Generic reference function? cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69291 Reviewed By: davidberard98 Differential Revision: D33278970 Pulled By: jbschlosser fbshipit-source-id: 8dd5b6d7c0099fa38aa037c186778b10834bdee4	2021-12-22 10:00:09 -08:00
Nikita Shulga	69b37a16f3	Remove unused CUDASolver.h from SparseCUDABlas (#70281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70281 Reviewed By: ngimel Differential Revision: D33272704 Pulled By: malfet fbshipit-source-id: a33a7f9cd1513115a0b9ab75530e85e9913e8dd3	2021-12-22 09:04:34 -08:00
wushirong	31c7e5d629	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: malfet Differential Revision: D33264641 Pulled By: wushirong fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf	2021-12-22 08:50:48 -08:00
CodemodService FBSourceClangFormatLinterBot	b5f71375f5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33275345 fbshipit-source-id: b07a27897680190f9fff86e22d8c68c1c9aff19a	2021-12-22 08:05:39 -08:00
Richard Zou	29f1ccc8f0	Fix some Composite Compliance problems with binary_cross_entropy backward (#70198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70198 This PR fixes composite compliance problems with: - binary_cross_entropy's backward formula - binary_cross_entropy_with_logits's backward formula - binary_cross_entropy's double backward formula It does so by adding checks for areAnyTensorSubclassLike. Test Plan: - I tested everything with functorch. - We are going to do https://github.com/pytorch/pytorch/issues/69530 in the future so we have a way of testing this in core. I need the binary_cross_entropy ones for something right now and didn't want to wait until we come up with a solution for #69530. Reviewed By: Chillee Differential Revision: D33246995 Pulled By: zou3519 fbshipit-source-id: 310ed3196b937d01b189870b86a6c5f77f9258b4	2021-12-22 07:24:04 -08:00
Kevin Tse	75dbe88b05	[DataPipe] removing unbatch_level from .groupby (#70249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70249 IMO, the `unbatch_level` argument is not needed here since users can simply can `.unbatch` before calling `.groupby` if needed. One small step closer to an unified API with other libraries. Note that we may rename the functional name from `.groupby` to `.group` in the future. TBD. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33259104 Pulled By: NivekT fbshipit-source-id: 490e3b6f5927f9ebe8772d5a5e4fbabe9665dfdf	2021-12-22 07:13:12 -08:00
Jiewen Tan	e02d836cb2	[LTC] Upstream LTCTensorImpl (#70062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062 This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch. It inherits from c10::TensorImpl and thus manages the lifetime/storage of LazyTensor. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.* Reviewed By: desertfire Differential Revision: D33171186 Pulled By: alanwaketan fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0	2021-12-22 03:21:52 -08:00
Raghavan Raman	633f770c3c	[StaticRuntime] Add out-variant support for TensorExprDynamicGroup op (#69479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69479 This diff adds support for out-variant optimization for `TensorExprDynamicGroup` op, which will be used for TensorExpr based fusion in Static Runtime. ghstack-source-id: 146107008 Test Plan: ``` buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Completed accuracy test on inline_cvr model 294738512 v0. Results: ``` get 1012 prediction values get 1012 prediction values pyper_inference_e2e_local_replayer_test.out.132ea03c2 pyper_inference_e2e_local_replayer_test.out.1858bbeb0 max_error: 0 % total: 0 ``` Reviewed By: d1jang, mikeiovine Differential Revision: D32768463 fbshipit-source-id: a3e6c1ea9ff5f3b57eb89095aa79a6d426fbb52a	2021-12-22 00:30:22 -08:00
Raghavan Raman	7d4db93a7d	[jit] Handle output tensor being passed in as inputs to TensorExprDynamicGroup (#69478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69478 This diff handles the case when output tensors are being passed in as inputs to TensorExprDynamicGroup op. This is in preparation to support out-variant optimizations in Static Runtime. ghstack-source-id: 146107007 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/jit:jit Reviewed By: eellison Differential Revision: D32823889 fbshipit-source-id: ff18e17fcd09953e55c8da6b892e60756521c2fc	2021-12-22 00:30:19 -08:00
Raghavan Raman	4dec15e6d8	[nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477 This diff adds a new run method to `TensorExprKernel` which takes in output tensors as inputs and stores the output in those given tensors. ghstack-source-id: 146107009 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs' Reviewed By: ZolotukhinM Differential Revision: D32823890 fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f	2021-12-22 00:30:15 -08:00
Raghavan Raman	0bdf4702f6	[jit] Add a new op that composes all of the dynamic shape logic (#69476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476 This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard. ghstack-source-id: 146107006 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32320082 fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73	2021-12-22 00:28:57 -08:00
Digant Desai	b613fbdbf2	Back out "[Quant] Added 4 bit support for embedding quantized module" (#70273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70273 Original commit changeset: 73e63383cf60 Original Phabricator Diff: D33152674 (`9f512e129b`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268459 fbshipit-source-id: 051bfcbbad3fa083301a3cea508d00946d6db881	2021-12-21 21:28:04 -08:00
Digant Desai	47ba28f3b5	Back out "[Quant][Eager] Added 4 bit support for eager mode quantization flow" (#70272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70272 Original commit changeset: 5cdaac5aee9b Original Phabricator Diff: D33152675 (`75718e5059`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268415 fbshipit-source-id: 99eb3209d513149ed23a1d9071d1b1c12174d09a	2021-12-21 21:28:01 -08:00
Digant Desai	a86f9806bc	Back out "[Quant][fx] Added test for quint4x2 for fx graph mode quantization" (#70274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70274 Original commit changeset: 89951fcd23e7 Original Phabricator Diff: D33152672 (`de4e7dece9`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268165 fbshipit-source-id: d667a761d72b9423407ce4d6617e9b6a04b5c9f8	2021-12-21 21:26:46 -08:00
Arvind Kannan	6217fee96b	Revert D33246843: [pytorch][PR] Implementation of Wishart distribution Test Plan: revert-hammer Differential Revision: D33246843 (`a217a62e73`) Original commit changeset: 825fcddf4785 Original Phabricator Diff: D33246843 (`a217a62e73`) fbshipit-source-id: 2c8063e8d10e9d3ac20fa44673e6011ed1160753	2021-12-21 18:55:49 -08:00
Nikita Shulga	2d509ff31b	[GHA] Fix doc push jobs (#70269 ) Summary: Home folder in docker images is `/var/lib/jenkins`, rather than `/home/jenkins` Also repo secrets can not start with `GITHUB_` prefix according to [Naming your secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets#naming-your-secrets) guide Fixes https://github.com/pytorch/pytorch/issues/70211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70269 Reviewed By: suo Differential Revision: D33271404 Pulled By: malfet fbshipit-source-id: 044bb34c75a0e8a9f0b2f5790be7aa2397524a24	2021-12-21 18:20:10 -08:00
Chen Lai	591ca4d6bc	[Operator Versioning][Edge] Reorganize upgrader initialization logic for thread safety (#70225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70225 Thanks for zhxchen17's suggestion. This pr move the operator initialization logic to `upgrader_mobile.cpp`, such that we can leverage the static variable to ensure the operator initialization only happens once. ghstack-source-id: 146103229 Test Plan: ``` buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled buck test mode/opt //caffe2/test/cpp/jit:jit buck test mode/dev //papaya/integration/service/test/mnist:mnist_system_test -- --exact 'papaya/integration/service/test/mnist:mnist_system_test - MnistFederatedSystemTest.test' ``` Reviewed By: zhxchen17 Differential Revision: D33247543 fbshipit-source-id: 6c3a87fe909a1be01452fa79649065845b26d805	2021-12-21 17:26:17 -08:00
soulitzer	21c6de9fdc	Extend autograd functional benchmarking to run vectorized tasks (#67045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67045 To run: `python benchmarks/functional_autograd_benchmark/functional_autograd_benchmark.py --gpu -1 --model-filter=ppl _robust_reg --num-iter 100` ``` Results for model ppl_robust_reg on task vjp: 0.0012262486852705479s (var: 2.2107682351446556e-10) Results for model ppl_robust_reg on task vhp: 0.002099371049553156s (var: 6.906406557760647e-10) Results for model ppl_robust_reg on task jvp: 0.001860950025729835s (var: 1.1251884146634694e-10) Results for model ppl_robust_reg on task hvp: 0.003481731517240405s (var: 2.2713633751614282e-10) Results for model ppl_robust_reg on task jacobian: 0.0012128615053370595s (var: 1.3687526667638394e-09) Results for model ppl_robust_reg on task hessian: 0.009885427542030811s (var: 9.366265096844018e-09) Results for model ppl_robust_reg on task hessian_fwdrev: 0.005268776323646307s (var: 2.4293791422991262e-09) Results for model ppl_robust_reg on task hessian_revrev: 0.002561321249231696s (var: 7.557877101938004e-10) Results for model ppl_robust_reg on task jacfwd: 0.002619938924908638s (var: 5.109343503839625e-10) Results for model ppl_robust_reg on task jacrev: 0.0013469004770740867s (var: 3.1857563254078514e-09) ``` Notes: - We go through batched fallback for both - ppl_robust_reg takes 3 tensor inputs and returns a single scalar output - this means that jacobian is equivalent to doing vjp and vmap would not help us - we expect jacfwd to be slower than jacrev Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33265947 Pulled By: soulitzer fbshipit-source-id: 14f537a1376dea7e5afbe0c8e97f94731479b018	2021-12-21 17:20:29 -08:00
Wanchao Liang	82c5f298ed	[shard] fix named_params_with_sharded_tensor (#70228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70228 fix named_params_with_sharded_tensor impl, where `named_parameters` already loop the submodules recursively, so we shouldn't put it in the submodule loop. ghstack-source-id: 146076471 Test Plan: Added more complicated test cases (that involves multiple submodules) to capture this issue. Reviewed By: pritamdamania87 Differential Revision: D33251428 fbshipit-source-id: cf24ca7fbe4a5e485fedd2614d00cdea2898239e	2021-12-21 15:29:38 -08:00
Kevin Tse	74c834e0dc	[DataPipe] adding a finally statement to ensure hook is reset (#70214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70214 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33255306 Pulled By: NivekT fbshipit-source-id: de2fe6bf08328e481c714aaad390db771073469e	2021-12-21 15:21:04 -08:00
vfdev	23902fb895	Fixed typo in torch check for cdist (#70178 ) Summary: Description: - Fixed typo in torch check for cdist cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70178 Reviewed By: bdhirsh Differential Revision: D33236027 Pulled By: zou3519 fbshipit-source-id: e87a982c0dc5fe576db8f2afc4b2010924f047c0	2021-12-21 15:16:39 -08:00
Kim Juhyeong	a217a62e73	Implementation of Wishart distribution (#68588 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68050 TODO: - [x] Unit Test - [x] Documentation - [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. https://github.com/pytorch/pytorch/issues/68720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68588 Reviewed By: bdhirsh Differential Revision: D33246843 Pulled By: neerajprad fbshipit-source-id: 825fcddf478555235e7a66de0c18368c41e935cd	2021-12-21 14:07:30 -08:00
Pritam Damania	0544f975e1	[reland] Support torch.equal for ShardedTensor. (#70145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145 Added support for torch.equal to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. ghstack-source-id: 146066939 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D33201714 fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b	2021-12-21 13:22:52 -08:00
Chen Lai	c321d4c1ca	[Operator Versioning] Split the upgrader test to a separate file and cover mobile part (#70090 ) Summary: 1. Split the test `test_save_load.py` to two files. Basically move the operator versioning related changes to `test_save_load_for_op_versions.py`. 2. Add mobile module related test to `test_save_load_for_op_versions.py` How to run: ``` buck test mode/opt //caffe2/test:jit or python test/test_jit.py TestSaveLoadForOpVersion ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70090 ghstack-source-id: 146103547 Test Plan: ``` buck test mode/opt //caffe2/test:jit python test/test_jit.py TestSaveLoadForOpVersion ``` Reviewed By: tugsbayasgalan Differential Revision: D33180767 fbshipit-source-id: dd31e313c81e90b598ea9dd5ad04a853c017f994	2021-12-21 13:08:01 -08:00
Raghavan Raman	a6f953156e	[StaticRuntime] Add TensorExpr fusion with dynamic shapes in SR (#69475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69475 This diff adds TensorExpr fusion with dynamic shapes in SR. This includes tracing the input graph with sample inputs, and then performing fusion with generalization to get fused graphs with dynamic shapes. ghstack-source-id: 146059043 Test Plan: ``` buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Reviewed By: d1jang Differential Revision: D32320088 fbshipit-source-id: 397f498878ddfcee9dad7a839652f79f034fefe3	2021-12-21 12:41:02 -08:00
Raghavan Raman	c6d1162325	[jit] Add support for dynamic shape fusion in JIT. (#69474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69474 This diff adds support for dynamic shape fusion in JIT. This is done by performing fusion with the static shapes observed on the first run, generalizing the fused subgraphs and generating code for the generalized fused subgraphs with dynamic shapes. ghstack-source-id: 146059044 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32781307 fbshipit-source-id: f821d9f8c271bcb78babcb4783d66f2f0020b0ea	2021-12-21 12:39:44 -08:00
Ivan Kobzarev	c5333cdfba	[nnc] tensorexpr for quantized::add (#70188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70188 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33238093 Pulled By: IvanKobzarev fbshipit-source-id: bd4e451bfd7531f31f216def2c3c1ba2f2e566e7	2021-12-21 12:30:56 -08:00
George Qi	bb51519937	bug fix FractionalMaxPool2d (random_samples dimensions) (#70031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200618 Pulled By: george-qi fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db	2021-12-21 12:21:54 -08:00
Raghavan Raman	91da2d5fa1	[StaticRuntime] Refactor StaticModule to pass in sample inputs (#69473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69473 This diff refactors StaticModule and its uses to pass in sample inputs. These inputs need to be passed into the constructor because they are need to perform TensorExpr fusion before other optimizations are performed on the input graph. ghstack-source-id: 146059041 Test Plan: buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test Reviewed By: donaldong Differential Revision: D32320084 fbshipit-source-id: b8bd46d442be4cc90ca60f521e0416fdb88eea60	2021-12-21 11:20:25 -08:00
Natalia Gimelshein	c4a6c7a436	fix cpu binary size increase for clamp (#70168 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/70168 Reviewed By: bdhirsh Differential Revision: D33229811 Pulled By: ngimel fbshipit-source-id: 3509da766fa327f4103fdcf880d368f64c111496	2021-12-21 10:59:27 -08:00
Ivan Kobzarev	5504e4ae5c	[nnc] Move DispatchParallel to external_functions (#70221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70221 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33249149 Pulled By: IvanKobzarev fbshipit-source-id: fa6b2535dc09229d72b1c45eaa75434477cdff5e	2021-12-21 10:51:38 -08:00
Peter Bell	304efd8e9a	Change TH_BLAS_MKL into AT_MKL_ENABLED() (#70219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69419 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246758 Pulled By: ngimel fbshipit-source-id: aedef4c9ef97b6aa9f574313c94f774b77df2748	2021-12-21 10:36:55 -08:00
Rohan Varma	a197f3fe52	[FSDP/Checkpoint] Activation offload support in checkpoint_wrapper (#70165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70165 Implements activation offload support in checkpoint_wrapper API via save_on_cpu hooks. We avoid modifying the torch.utils.checkpoint implementation and instead compose offload + checkpoint using the save_on_cpu hook for the former. ghstack-source-id: 146078900 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33228820 fbshipit-source-id: 98b4da0828462c41c381689ee07360ad014e808a	2021-12-21 10:08:18 -08:00
Slava Kovalevskyi	e428a90553	Android build migrated to GHA. (#68843 ) Summary: All for builds of the Android (arm32/64 and x86_32/64) are not migrated to the GHA, away from circleCI. Since this part of the workflow creates final binary with all architectures in it, it was not possible to do migration step by step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68843 Reviewed By: malfet Differential Revision: D33257480 Pulled By: b0noI fbshipit-source-id: dd280c8268bdd31763754c36f38e4ea12b23cd2e	2021-12-21 10:02:51 -08:00
Brian Hirsh	5e222d08a1	Revert "Revert D32498572: allow external backend codegen to be used without autograd kernels" (#69949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69949 This reverts commit 33363cea64fd4be16975c32cf57e9eb123af371d. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D33113544 Pulled By: bdhirsh fbshipit-source-id: e219f10d52776498c9ad273e97bca3e3406cf702	2021-12-21 08:19:37 -08:00
saekanay	8e763cd735	Add explicit OperatorHandle destructor (#70033 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70032 Windows build of PyTorch doesn't produce the `c10::OperatorHandle::~OperatorHandle(void)` symbol in any of its `*.lib` files. This fix is to explicitly define it in Dispatcher.cpp, so downstream consumers wanting to dllimport can find it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70033 Reviewed By: jbschlosser Differential Revision: D33240599 Pulled By: bdhirsh fbshipit-source-id: 56cc5963043bd5caac30e42c3501a4f48d086b36	2021-12-21 07:39:26 -08:00
Vasiliy Kuznetsov	adaf383837	dbr quant: better fix for bug with recursion on dequantize (#70128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70128 Previous code disabled torch_function when dequantizing arguments to an unquantizeable function. This PR blocklists the dequantize method from the dequantize hook instead, so we can remove the previous hack. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: ejguan Differential Revision: D33194396 Pulled By: vkuzo fbshipit-source-id: 6175c2da637c1d0c93b3fea0ef8218eaee6a2872	2021-12-21 06:25:37 -08:00
Vasiliy Kuznetsov	cce9c9aa45	dbr quant: stop overridding tensor getters (#70115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70115 This PR turns off DBR quant __torch_function__ overrides on tensor attribute getters such as `x.dtype`. This should help with making the debug logs more readable, and reduce framework overhead. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: ejguan Differential Revision: D33189544 Pulled By: vkuzo fbshipit-source-id: e0d664bb6b76ca9e71c8a439ae985a0849312862	2021-12-21 06:25:34 -08:00
Vasiliy Kuznetsov	f291708058	dbr quant: clean up logging format (#70114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70114 This PR makes the debug logging for DBR quant be more useful and easier to read. New format looks like ``` DEBUG:auto_trace: fqn: _tf_ <function tanhshrink at 0x7fa4d02d4790> out torch.float32 end ``` This will be useful to speed up further work. Test Plan: ``` // run this with logging enabled, logs easier to read python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33189545 Pulled By: vkuzo fbshipit-source-id: 20af7e066e710beac5a3871a9d6259ee5518f97d	2021-12-21 06:25:31 -08:00
Vasiliy Kuznetsov	fb2a6747b8	dbr quant: add test for qconfig_dict and methods (#70109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70109 Adds a test case for DBR quant + qconfig_dict specifying methods by object_type. Fixes a bug in the FX rewriter for scripting to make the test pass. Full coverage of methods will come in future PRs, this PR is just to verify qconfig_dict is hooked up correctly. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_qconfig_dict_object_type_method ``` Reviewed By: jerryzh168 Differential Revision: D33188160 Pulled By: vkuzo fbshipit-source-id: 47ab9dbca8cdb1cf22d6d673d9c15b3bc0d1ec81	2021-12-21 06:24:18 -08:00
rohitgr7	78bea1bb66	update example in classification losses (#69816 ) Summary: Just updated a few examples that were either failing or raising deprecated warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69816 Reviewed By: bdhirsh Differential Revision: D33217585 Pulled By: albanD fbshipit-source-id: c6804909be74585c8471b8166b69e6693ad62ca7	2021-12-21 02:46:48 -08:00
Michael Suo	19f898402d	Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test Test Plan: revert-hammer Differential Revision: D33241684 (`dab3d3132b`) Original commit changeset: cd498908b00f Original Phabricator Diff: D33241684 (`dab3d3132b`) fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7	2021-12-20 23:14:35 -08:00
Joel Schlosser	b376d82caf	Remove backward op for slow dilated 2d convolution (#70067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70067 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33172551 Pulled By: jbschlosser fbshipit-source-id: 2f1802c77253e543ebb7ee8ee0a12fa4defde311	2021-12-20 19:18:34 -08:00
wushirong	dab3d3132b	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: janeyx99 Differential Revision: D33241684 Pulled By: wushirong fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c	2021-12-20 18:51:48 -08:00
Jon Morton	123be0e5b7	[fusion] Add ConvTranspose+BN fusion support (#70022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70022 Add support for fusing ConvTranpose{1,2,3}d with BatchNorm{1,2,3}d. This re-uses the existing fusion logic but adds a "transpose" flag to the fusing function which when enabled will use the appropriate reshape for ConTranspose's transposed weights. Test Plan: `buck test mode/dev //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion` Reviewed By: jerryzh168 Differential Revision: D33074405 fbshipit-source-id: 5e9eff1a06d8f98d117e7d18e80da8e842e973b7	2021-12-20 18:42:48 -08:00
Donald Dong	24f16de987	[Static Runtime] Support native op split_with_sizes (#69999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69999 This adds support for the split_with_sizes operator in static runtime by adding native operators. Those operators will have less overhead comparing to their JIT fallbacks (no dispatching, no stack constructing in runtime). split_with_sizes can be called directly from cpp API, or in `torch.split` when `split_sizes` is a list. This diff adds support for both use cases. Test Plan: - Added unit tests. Made sure the operators are used - Benchmark ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/data/users/dxd/305797439_0.predictor.precompute.remote_request_only \ --method_name=user.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=500 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \ --input_type="recordio" --pt_inputs=/data/users/dxd/305797439_0_user.inputs.recordio \ --recordio_use_ivalue_format=1 --do_profile=1 --do_benchmark=1 ``` #### Before ``` Static runtime ms per iter: 3.62073. Iters per second: 276.187 0.0471904 ms. 1.31501%. aten::split_with_sizes (5 nodes) ``` #### After ``` Static runtime ms per iter: 3.44374. Iters per second: 290.382 0.0432057 ms. 1.34276%. aten::split_with_sizes (5 nodes, native) ``` Reviewed By: swolchok Differential Revision: D33141006 fbshipit-source-id: feae34c4c873fc22d48a8ff3bf4d71c0e00bb365	2021-12-20 18:32:54 -08:00
Haixin Liu	6623c4838e	Handle the corner case when min == max in L2 search (#70207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70207 In corner case when min == max, adjust_hist_to_include_zero() function used in L2 search will cause additional_nbins = -2147483648 and initialize bins_f with negative size. Test Plan: Before fix: f315187213 After fix: f315471862 Reviewed By: jspark1105 Differential Revision: D33227717 fbshipit-source-id: 7e8a455e51a0703a3a9c5eb7595d9b4d43966001	2021-12-20 17:46:55 -08:00
Joel Schlosser	f17e76b0f2	Expand description of bias_sizes arg for convolution_backward (#70195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70195 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33240155 Pulled By: jbschlosser fbshipit-source-id: c4f907d6e33e4d1eeb1b5228f1152307c8b27729	2021-12-20 17:33:17 -08:00
Colin Taylor	3e8ef9a272	Add return type annotation for ShardedTensor (#69945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69945 Test Plan: CI Reviewed By: wanchaol Differential Revision: D32502393 fbshipit-source-id: 7bea08762446b211d8ea028d024d2acdabe45479	2021-12-20 17:15:44 -08:00
Jane Xu	c555b7bacb	GHA: Remove caffe2 check in Windows shard 1 smoke tests (#70010 ) Summary: Windows shard 1 hasn't actually been running any tests because the script that does so exited before running the python tests but did not report an error. This has been happening to all windows tests across the board, for example https://github.com/pytorch/pytorch/runs/4526170542?check_suite_focus=true Removing the caffe2.python check passes the smoke tests now. You can observe that the run_test.py file is called in the windows cpu job now https://github.com/pytorch/pytorch/runs/4541331717?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70010 Reviewed By: malfet, seemethere Differential Revision: D33161291 Pulled By: janeyx99 fbshipit-source-id: 85024b0ebb3ac42297684467ee4d0898ecf394de	2021-12-20 16:05:38 -08:00
Natalia Gimelshein	e6d9bb8d57	reduce the number of instantiations for bernoulli tensor tensor kernel (#70169 ) Summary: Reduces the binary size of DistributionBernoulli.cu 12282600 -> 3946792 Tensor-tensor bernoulli kernels are rarely used, we limit dispatches to double probability type for double `self` tensor, and `float` probability type for everything else. This would be a minor perf hit if probability tensor is of the different dtype, but given how rarely these kernels are used (and how rarely the probability tensor is not float) this is not a problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70169 Reviewed By: jbschlosser Differential Revision: D33237890 Pulled By: ngimel fbshipit-source-id: 185c4b97aba0fb6ae159d572dd5bbb13cf676bb4	2021-12-20 13:46:34 -08:00
Rohan Varma	79a40b22aa	[Checkpoint] Make checkpoint_wrapper an nn.Module (#70164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70164 Implement Alban's suggestion to make checkpoint_wrapper an nn.Module instead of patching the forward pass, which is too hacky. ghstack-source-id: 146011215 Test Plan: IC Reviewed By: mrshenli Differential Revision: D33214696 fbshipit-source-id: dc4b3e928d66fbde828ab60d90b314a8048ff7a2	2021-12-20 13:22:28 -08:00
Jane Xu	fcaecd718a	Write flaky tests to rockset (#70136 ) Summary: Try using Rockset as backend for data instead of RDS Pull Request resolved: https://github.com/pytorch/pytorch/pull/70136 Reviewed By: suo Differential Revision: D33242148 Pulled By: janeyx99 fbshipit-source-id: 8935ceb43717fff4922b634165030cca7e934968	2021-12-20 13:17:21 -08:00
soulitzer	5651e1e3ad	Add auto_linear formulas and some others (#69727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69727 Still need to test the backward ones. We would need to update gradgradcheck to check forward over backward. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031728 Pulled By: soulitzer fbshipit-source-id: 86c59df5d2196b5c8dbbb1efed9321e02ab46d30	2021-12-20 12:15:25 -08:00
Mike Iovine	65f54bc000	[SR] Optimize VarStack (#68750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750 There was some room for optimization in static runtime's `prim::VarStack`: * Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor>` Skip the memory overlap check * Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`) Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D32596934 fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0	2021-12-20 11:46:11 -08:00
Shirong Wu	a799ffebd2	Create lower code example (#70142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70142 Create lower code example in oss, and run benchmark agaist resnet101 Test Plan: CI Reviewed By: 842974287 Differential Revision: D33117440 fbshipit-source-id: 359d0c9e65899ab94c8f3eb112db70db5d938504	2021-12-20 11:37:08 -08:00
Nikita Shulga	423ce416d8	Prune osx-arm64 binaries from nightly channel (#70132 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70132 Reviewed By: janeyx99 Differential Revision: D33195431 Pulled By: malfet fbshipit-source-id: 4579a6788255a6df306862c3e959ae7a9ddd4e45	2021-12-20 11:28:43 -08:00
David Berard	41959ce77f	[JIT] scripting, freezing, serialization for sparse csr (#69555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69555 1. Implement pickling/unpickling 2. Add `test_freeze_sparse_csr, tests_serialize_sparse_csr` tests Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33181367 Pulled By: davidberard98 fbshipit-source-id: a15d5193a7b1b1625a27e4af003cec33cdbc8071	2021-12-20 11:13:34 -08:00
David Berard	bcb6076099	Sparse CSR tensors: storage access should throw (#70072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70072 Like (sparse COO tensors), sparse CSR tensors don't really have an actual storage() that can be accessed, so sparsetensor->storage() should throw. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33181309 Pulled By: davidberard98 fbshipit-source-id: 8f1dc4da03073d807e5acee2ac47caeffb94b16c	2021-12-20 11:12:01 -08:00
Shirong Wu	bcc7dbdf37	Change open source unit test deps (#70167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70167 1. Change unit test dependency to open source base class, so that this unit test can run on git oss CI 2. Remove usage of typing.Protocol, so that lower can run with Python 3.6 Test Plan: oss CI passed with change included in commit: https://github.com/pytorch/pytorch/actions/runs/1597530689 see test(fx2trt) Reviewed By: yinghai Differential Revision: D33228894 fbshipit-source-id: ffe3d40a02a642b3b857a0605101797037a580bb	2021-12-20 10:41:38 -08:00
George Qi	dd02af6283	Bilinear no_batch_dim (#69539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69539 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200105 Pulled By: george-qi fbshipit-source-id: c674e3937fea95c4ec41a01c5aa6d6890042b288	2021-12-20 09:44:07 -08:00
Taylor Robie	978089c381	Prevent divide-by-zero errors in Timer (#70050 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66503 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70050 Reviewed By: mruberry Differential Revision: D33168868 Pulled By: robieta fbshipit-source-id: 7d0ece9e888f6c69a9e0ced581c92d3259fb3540	2021-12-20 09:16:03 -08:00
Kevin Tse	ad0cd8a76e	[DataPipe] Improve inline doc and testing for CollatorIterDataPipe (#70139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70139 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33199107 Pulled By: NivekT fbshipit-source-id: f96d77490998ac9bc3da8d4ff1a9caa08e9e7f27	2021-12-20 08:05:21 -08:00
Chen Lai	8a912014b1	[Operator Versioning][Edge] Initialize upgrader thread safe (#70161 ) Summary: Upgrader should only be initialized once when runtime loads the first module. It no longer needs to initialized afterwards. Previously, instead of using an atomic variable, the upgrader will be initialized depends on whether byteCodeFunctionWithOperator.function.get_code().operators_ is empty. If it's empty, it means the operator from the upgrader is not initialized yet. However, it's not thread safe. When multiple thread loads module together, it's possible that they all consider it's the first module. Use an atomic variable here to make sure it's thread safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70161 ghstack-source-id: 146012642 Test Plan: ``` buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled buck test mode/opt //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D33220320 fbshipit-source-id: 10f2397c3b358d5a1d39a2ce25457e3fdb640d2c	2021-12-19 20:16:00 -08:00
Taylor Robie	7ea86dfdb1	[Profiler] Factor common logic into `torch/csrc/profiler/api.h` (#69459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459 This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler: 1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic. 2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`. Test Plan: Ran the overhead ubenchmark. Reviewed By: aaronenyeshi Differential Revision: D32678163 fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c	2021-12-19 18:40:28 -08:00
CodemodService FBSourceClangFormatLinterBot	181120f7d7	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33229251 fbshipit-source-id: 3a69bb459fa0a65888d6f9c8e70b5de032ddad97	2021-12-19 16:38:25 -08:00
CodemodService FBSourceBuckFormatLinterBot	60191196d4	[AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily `arc lint --take BUCKFORMAT` Reviewed By: zertosh Differential Revision: D33229262 fbshipit-source-id: 7c22aa59a2a9eea94d2f403c339eb20abc7d9c41	2021-12-19 16:34:00 -08:00
Peter Bell	ef70174f2e	Separate c10::Symbol header from list of interned strings (#69406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406 Most files that include `interned_strings.h` don't actually depend on anything generated from `FORALL_NS_SYMBOLS` yet because they're in a single file you need to recompile whenever a new symbol is added. Here I move the class definition into a separate file so this doesn't happen. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32923637 Pulled By: albanD fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7	2021-12-19 14:52:26 -08:00
Natalia Gimelshein	06d0536dad	Low precision support for jiterator (#70157 ) Summary: This adds support for bfloat16 and fp16 types for jiterator by adding at::Half and at::BFloat16 classes to the jiterator code template. The only methods defined in those classes are construction from float and implicit conversion to float. Mathematical operations on them never need to be defined, because jiterator is written in a way to implicitly upcast the inputs to the functor, so all math has to be performed on float only (e.g. compute part of the kernel would always be written as ``` out[j] = i0<float>(arg0[j]); ``` It also adds support for casting to complex outputs, by adding a similar templated class c10::complex<T>. Originally I planned to only support float -> complex complex for it, but to compile fetch_and_cast function we also need complex -> float conversion. We can avoid it by compiling fetch_and_cast for a different subset of types, but I'm not doing it in this PR. Thus, technically, we can compile a kernel that would accept complex inputs and produce wrong results, but we are guarding against it by static asserting that none of the functor datatype are complex, and runtime-checking that none of the inputs are complex. Adding bfloat16, half and complex support allows us to remove special handling for type promotion tests for gcd. i0 (that supports half and bfloat16 inputs) is moved to use jiterator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70157 Reviewed By: mruberry Differential Revision: D33221645 Pulled By: ngimel fbshipit-source-id: 9cfe8aba3498a0604c4ea62c217292ea06c826b1	2021-12-19 11:56:57 -08:00
jiej	78f06e0690	fixing conv2d decomposition and tests (#70127 ) Summary: Current implementation has a bug where decomposed `add_optional` from `conv2d` is placed before the producer node, this causes linter error on graph. Cherry-picked from https://github.com/csarofeen/pytorch/pull/1333 Fixing issue posted in https://github.com/csarofeen/pytorch/issues/1325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70127 Reviewed By: ejguan Differential Revision: D33199018 Pulled By: jansel fbshipit-source-id: bce1f14a443811b4d55116a04fd4daa86084cc47	2021-12-19 10:38:23 -08:00
dzdang	de4e7dece9	[Quant][fx] Added test for quint4x2 for fx graph mode quantization (#69846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69846 Test Plan: In pytorch main dir, execute to run the added test Reviewed By: jbschlosser Differential Revision: D33152672 Pulled By: dzdang fbshipit-source-id: 89951fcd23e7061d6c51e9422540b5f584f893aa	2021-12-19 06:15:26 -08:00
David Dang	75718e5059	[Quant][Eager] Added 4 bit support for eager mode quantization flow (#69806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69806 Minor modifications were made to support 4 bit embedding quantized module in eager mode quantization flow and to allow for testing of the changes Test Plan: In pytorch main dir, execute ``` python test_quantization.py TestPostTrainingStatic.test_quantized_embedding ``` to run the series of tests, including the newly added test_embedding_4bit function Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152675 fbshipit-source-id: 5cdaac5aee9b8850e61c99e74033889bcfec5d9f	2021-12-19 06:14:12 -08:00
David Dang	9f512e129b	[Quant] Added 4 bit support for embedding quantized module (#69769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69769 Added 4 bit support and the correpsonding test in the module api. Restructured the test_quantized_module for both 4 & 8 bit support. Test Plan: In pytorch main dir, execute ``` python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api ``` Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152674 fbshipit-source-id: 73e63383cf60994ab34cc7b4eedd8f32a806cf7f	2021-12-18 22:26:24 -08:00
David Dang	b331752314	[Quant] Implemented 4 bit embedding op support; added corresponding test case (#69768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69768 Support for the 4 embedding operator has been added. The support is analogous to the preexisting support for byte/8bit embedding. A corresponding test case was added to test_quantized_embedding_op.py Test Plan: In pytorch main dir, execute ``` python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api ``` to run the series of tests, including the newly added test_embedding_4bit function Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152673 fbshipit-source-id: bdcc2eb2e37de38fda3461ff3ebf1d2fb5e58071	2021-12-18 22:03:33 -08:00
Jerry Zhang	94abf120c8	[quant][fx][graphmode][be] Use is_qat instead of model.training as a flag for qat (#69878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69878 But we'll still verify that model.training is True when user call prepare_qat API. Relaxing this condition might also mean that we change the api for methods in fuser_method_mapping, with additional flag for qat (currently we just have different fusions for training/eval), I don't think this is P0, we could revisit if there is a need in the future Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33080988 fbshipit-source-id: b13715b91f10454948199323c5d81ef88bb3517f	2021-12-18 00:00:46 -08:00
Ivan Kobzarev	fb34af1b21	[nnc][quantization] Optimize constructTensors in ext functions (#69856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69856 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064756 Pulled By: IvanKobzarev fbshipit-source-id: 430d850f8591b8e0a0bdba5c41896627a72db88e	2021-12-17 23:45:03 -08:00
Mike Ruberry	84b7832010	Updates CUDA memory leak check to verify against driver API and print more diagnostic information (#69556 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/69556 Reviewed By: mrshenli Differential Revision: D32954770 Pulled By: mruberry fbshipit-source-id: a6c2ae6f704422c178569980ca4b9c72c4272f55	2021-12-17 23:37:49 -08:00
Jerry Zhang	6c68045f60	[quant][graphmode][fx][be] Fix a typo in quantization/fx/graph_module (#69877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69877 att Test Plan: ``` python tes/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33079525 fbshipit-source-id: dfd3afb916067a628071a59ce95c6b1d228a3c72	2021-12-17 23:33:33 -08:00
Jerry Zhang	9d3a6fa623	[quant][bc-breaking] Remove QConfigDynamic from quantization api (#69875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69875 att Test Plan: ci + regression tets: ``` python test/test_quantization.py TestPostTrainingStatic python test/test_quantization.py TestPostTrainingDynamic python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33079096 fbshipit-source-id: 1e73bb27c518eba62b60f3a8c4b532dddc8367cf	2021-12-17 23:10:06 -08:00
Jerry Zhang	5db711f9d3	[quant][be] Replace QConfigDynamic with QConfig in code (#69864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864 att, will have a follow up PR that removes QConfigDynamic in the api Test Plan: regression tests ``` python test/test_quantization.py TestPostTrainingStatic python test/test_quantization.py TestPostTrainingDynamic python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33073235 fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db	2021-12-17 22:30:57 -08:00
Shiyan Deng	c463d50098	[fx2trt] Convert to tuple is output_size of adaptive avg pool is an integer (#70144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70144 It can be an integer and in this case we need to extend it. Test Plan: Added a unit test. ``` RemoteExecution session id: reSessionID-d97b46e3-20d1-4f5c-a166-4efcf1579352-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391775638 ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_adaptive_avgpool - main (9.454) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_with_dynamic_shape (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.083) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_1 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.349) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_2 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.543) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_0 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.651) Summary Pass: 4 ListingSuccess: 1 ``` Reviewed By: wushirong Differential Revision: D33200773 fbshipit-source-id: 8c10d644982a4723a78f8615d8bcdbc3968790db	2021-12-17 18:31:25 -08:00
Alex Beloi	9ee3006d58	[fx-acc][graph-opts] bug fixes for transpose_to_reshape, optimize_quantization, finalize_kwargs_to_concrete Summary: Fixes a couple of bugs that surfaced during integration of graph opts into `AcceleratedGraphModule` (D31484770). 2. Fix bug in `graph_opt.transpose_to_reshape` implementation that causes it to incorrectly apply opt for `permute` op acting on shape `(B, N, N)` with `N > 1` and permutation `(0, 2, 1)`. Fixed the bug and added test case to cover this case. 3. Revert part of D31671833 (`0e371e413d`), where I made `acc_out_ty` into a required argument 4. Align `graph_opt.transpose_to_reshape` and `graph_opt.optimize_quantization` to not set `acc_out_ty` when adding a new node to graph and instead rely on tensor metadata 5. Run `acc_utils.copy_acc_out_ty_from_meta_to_acc_ops_kwargs()` in `GraphOptsTest.verify_numerics` before running graph on sample inputs. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts: ``` ``` ... Summary Pass: 85 ListingSuccess: 4 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/562950163929022 ``` Reviewed By: jfix71 Differential Revision: D31851549 fbshipit-source-id: 602affe2a2a0831d2f17b87025107ca87ecb0e59	2021-12-17 17:35:48 -08:00
Shiyan Deng	bd9983366b	[fx2trt] Add support for torch.mean (#70052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70052 As the title. Also refactored a bit to separate out the common part of adding a reduce operator. This would make mnasnet lowerable without splitter. Test Plan: Added unit tests. Reviewed By: wushirong Differential Revision: D33163950 fbshipit-source-id: 7eb8f8a852cd8e8d9937029c4b4602b036502b3a	2021-12-17 15:48:31 -08:00
Joel Schlosser	9fb199bc12	Add convolution_backward to aten_interned_strings.h (#70112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70112 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33188664 Pulled By: jbschlosser fbshipit-source-id: 20e565c2fef4c1c3c087ba9b36320b7e539e467e	2021-12-17 15:38:47 -08:00
Nikita Shulga	9b14d93d78	Fix bazel workflows (#70137 ) Summary: Fixes regression after manual rebase of `e35bf56461` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70137 Reviewed By: pbelevich Differential Revision: D33197055 Pulled By: malfet fbshipit-source-id: 21adf7297f75715a59d2a1b3751b4ec8f71c7c03	2021-12-17 14:48:11 -08:00
Richard Barnes	70ed4f3ffc	Try dropping Torch from typeshed_internal (#69926 ) Summary: Removes the internal typeshed for PyTorch and replaces it with PyTorch's own type annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69926 Generated files are in P471601595, P471601643, P471601662 Based on an example in D26410012 Test Plan: Sandcastle Reviewed By: malfet, pradeep90 Differential Revision: D32292834 fbshipit-source-id: 5223f514cbdccd02c08ef0a027a48d92cdebed2c	2021-12-17 14:08:19 -08:00
Thuyen Ngo	e35bf56461	[Bazel] Add CUDA build to CI (#66241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35316 On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR: - Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`. - Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3. - Fix current broken cuda build. - Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241 Reviewed By: ejguan Differential Revision: D31544091 Pulled By: malfet fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206	2021-12-17 13:44:29 -08:00
soulitzer	e0f4e28c69	Skip forward-over-reverse gradgrad check for pinv singular on CUDA fo… (#70123 ) Summary: …r cdouble Fixes https://github.com/pytorch/pytorch/issues/70046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70123 Reviewed By: zou3519 Differential Revision: D33193017 Pulled By: soulitzer fbshipit-source-id: 846f97ad1bf38c7239e9fc40fd5f476e29264f7c	2021-12-17 13:38:57 -08:00
Jiewen Tan	38e026c14d	Add tanh_backward to AT symbols (#70071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70071 This commit adds tanh_backward to aten_interned_strings.h as an AT symbol. Test Plan: CI. Reviewed By: mruberry Differential Revision: D33173370 Pulled By: alanwaketan fbshipit-source-id: e20ed2a807156ce772b7c1e3f434fa895116f4c3	2021-12-17 13:35:05 -08:00
XiaobingSuper	a6b7521428	always use max cmake when cmake3 and cmake are all existed (#69355 ) Summary: For Pytorch source build when using Ninja generator, it requires CMake >=3.13, Pytorch always checks cmake3 >= 3.10 first, so when 3.13> cmake3 >= 3.10 and then PyTorch will use cmake3, there will report an error: ```Using the Ninja generator requires CMake version 3.13 or greater``` even the CMake >=3.13 . For example: for my centos machine, the system CMake3 is ```3.12```, and my conda env's CMake is ```3.19.6```, there will have a build error which PyTorch choose CMake 3, I can update CMake3 or create an alias or a symlink to solve this problem, but the more reasonable way is that ```_get_cmake_command ``` always return the newest CMake executable (unless explicitly overridden with a same CMAKE_PATH environment variable). Pull Request resolved: https://github.com/pytorch/pytorch/pull/69355 Reviewed By: jbschlosser Differential Revision: D33062274 Pulled By: malfet fbshipit-source-id: c6c77ce1374e6090a498be227032af1e1a82d418	2021-12-17 12:53:49 -08:00
Kyle Chen	254360e182	[ROCm] Skip test_fn_fwgrad_bwgrad_* unexpected success tests (#70124 ) Summary: Skip tests that cause unexpected success for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> additional to this PR: https://github.com/pytorch/pytorch/pull/70061 skipping 4 more tests that cause unexpected success and fail the CI job for ROCm log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test2/15350/console cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/70124 Reviewed By: ejguan Differential Revision: D33193508 Pulled By: malfet fbshipit-source-id: 9949910e2e7dc66cbadd23cea874df26e2d4136d	2021-12-17 12:08:47 -08:00
Nikita Shulga	26e32988bd	Revert D32596264: Codegen: TraceType only includes operators being registered Test Plan: revert-hammer Differential Revision: D32596264 (`e66a8ab4f5`) Original commit changeset: 2f28b62d7b99 Original Phabricator Diff: D32596264 (`e66a8ab4f5`) fbshipit-source-id: 7d18c4e77ce30dd7817a95f9c39b565cb246cd12	2021-12-17 11:20:12 -08:00
Nikita Shulga	2f622e87bd	Revert D32596274: Codegen: ADInplaceOrViewType only include operators registered Test Plan: revert-hammer Differential Revision: D32596274 (`9ad940d982`) Original commit changeset: 400cad023782 Original Phabricator Diff: D32596274 (`9ad940d982`) fbshipit-source-id: 5c53195edaae47b9daba373cf166d2382178d01b	2021-12-17 11:02:08 -08:00
Ivan Yashchuk	60eb1e53b2	Sparse CSR CPU: Add block sparse support for MKL path (#68710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710 This PR adds support for block sparse (BSR) matrices for functions that use Inspector-Executor MKL Sparse API. At the moment of this PR it's: * torch.addmm * torch.addmv * torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged) cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33179486 Pulled By: cpuhrsch fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae	2021-12-17 10:56:05 -08:00
vfdev-5	0cfff65395	Apply contiguous on inputs of cdist backward (#70016 ) Summary: Description: - Apply contiguous on inputs of cdist backward - Added a test Fixes https://github.com/pytorch/pytorch/issues/69997 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70016 Reviewed By: ejguan Differential Revision: D33187946 Pulled By: albanD fbshipit-source-id: 645306aa043b2f84c4c2df0306fabfc224d746b6	2021-12-17 10:54:45 -08:00
Kyle Chen	bc95e5a196	[ROCm] Skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (#70061 ) Summary: This PR will skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 test for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> Related github isssue: [https://github.com/pytorch/pytorch/issues/70027](https://github.com/pytorch/pytorch/issues/70027) jithunnair-amd jeffdaily cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/70061 Reviewed By: ejguan Differential Revision: D33189411 Pulled By: malfet fbshipit-source-id: a60d5b35099d3c8d3ceebb996e91470a8a676f85	2021-12-17 10:47:31 -08:00
Ivan Kapelyukh	de992c6b21	Specify ij indexing when cartesian_prod calls meshgrid (#68753 ) Summary: Currently, `cartesian_prod` calls `meshgrid` without passing an indexing parameter. This causes a warning to be shown when running the `cartesian_prod` example from the docs. This PR simply passes the default value for this indexing parameter instead. Fixes https://github.com/pytorch/pytorch/issues/68741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68753 Reviewed By: kimishpatel Differential Revision: D33173011 Pulled By: mruberry fbshipit-source-id: 667185ec85bd62bda177bc5768d36f56cfc8b9ab	2021-12-17 10:39:44 -08:00
Peter Bell	9ad940d982	Codegen: ADInplaceOrViewType only include operators registered (#68692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68692 ADInplaceOrViewType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also ports the generated code over to the `at::_ops` interface, and the code generator itself to using `write_sharded` instead of re-implementing its own version of sharding. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet Differential Revision: D32596274 Pulled By: albanD fbshipit-source-id: 400cad0237829720f94d60f9db7acd0e918e202e	2021-12-17 10:36:20 -08:00
Peter Bell	e66a8ab4f5	Codegen: TraceType only includes operators being registered (#68691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691 TraceType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also changes all the included autograd and jit headers from including `ATen/ATen.h` to just including `ATen/core/Tensor.h`. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet Differential Revision: D32596264 Pulled By: albanD fbshipit-source-id: 2f28b62d7b9932f30fad7daacd8ac5bb7f63c621	2021-12-17 10:35:05 -08:00
Albert Liang	0d06616c47	Add `dict` methods to `ParameterDict` (#69403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68476 We implemented all of the following `dict` methods for `ParameterDict` - `get ` - `setdefault` - `popitem` - `fromkeys` - `copy` - `__or__` - `__ior__` - `__reversed__` - `__ror__` The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403 Reviewed By: albanD Differential Revision: D33187111 Pulled By: jbschlosser fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb	2021-12-17 10:15:47 -08:00
Joel Schlosser	35519428a2	Remove backward ops for miopen depthwise convolution (#70064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70064 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33171169 Pulled By: jbschlosser fbshipit-source-id: 668ca9baa992d3bb1cfa7b53fd2127ffeb051147	2021-12-17 10:08:49 -08:00
Joel Schlosser	ab2a739851	Remove backward ops for miopen transposed convolution (#70063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70063 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33171170 Pulled By: jbschlosser fbshipit-source-id: 4fd6c1cd027f714354644c4ac7694d0f9092c762	2021-12-17 10:07:27 -08:00
Peter Bell	ec577300d7	OpInfo: Convert more sample_input_funcs to generators (#69976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69976 These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself. Re-submit of #69257 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33172953 Pulled By: mruberry fbshipit-source-id: 7b8bae72df6a225df88a158b7ffa82a71d3c061b	2021-12-17 10:03:59 -08:00
Peter Bell	950957f857	Fix jit tests assuming sample_inputs is a list (#69975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69975 cc mruberry Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33172952 Pulled By: mruberry fbshipit-source-id: 1f8bb49179f7fbd0fec5e7344e8c213484518e27	2021-12-17 10:02:50 -08:00
Nikita Shulga	ad79d0dd4b	Add `ciflow/trunk` label (#69575 ) Summary: Which includes all workflows but periodic ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/69575 Reviewed By: seemethere Differential Revision: D32932850 Pulled By: malfet fbshipit-source-id: 80b58fb3a0d5f8dbc527124be5bf25bd716448b8	2021-12-17 09:57:46 -08:00
Philip Meier	de296d526f	move torch.testing from prototype to beta (#69668 ) Summary: cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/69668 Reviewed By: albanD Differential Revision: D33028213 Pulled By: mruberry fbshipit-source-id: 3316b887d4c322cc1262feee651464da4124a6de	2021-12-17 09:52:47 -08:00
CodemodService FBSourceClangFormatLinterBot	de2d9e2966	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33183467 fbshipit-source-id: d7c37f3522a38e85891524c544eab4fdb01270de	2021-12-17 09:45:20 -08:00
Masaki Kozuki	1065739781	Fix build on latest main branch of thrust - SoftMax.cu (#70039 ) Summary: Similar to https://github.com/pytorch/pytorch/issues/69985 I think there's any other source file which should `#include <thrust/iterator/constant_iterator.h>` as of `73a6c36f1b` ``` mkozuki@mkozuki-srv ~/ghq/github.com/crcrpar/torch-0 master torch-0 ❯ git rev-parse HEAD; rg -inw make_constant_iterator 73a6c36f1bfbf9aff04ba41cfe6ab06aa99883d9 aten/src/ATen/native/cuda/LegacyThrustHelpers.cu 54: thrust::make_constant_iterator(1), aten/src/ATen/native/sparse/cuda/SoftMax.cu 301: thrust::make_constant_iterator(int64_t(1)), ``` ## build error ```console https://github.com/pytorch/pytorch/issues/22 2048. /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o.d -x cu -c ../aten/src/ATen/native/sparse/cuda/SoftMax.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o https://github.com/pytorch/pytorch/issues/22 2048. ../aten/src/ATen/native/sparse/cuda/SoftMax.cu(301): error: namespace "thrust" has no member "make_constant_iterator" ... https://github.com/pytorch/pytorch/issues/22 2048. 13 errors detected in the compilation of "../aten/src/ATen/native/sparse/cuda/SoftMax.cu". ``` cc xwang233 zasdfgbnm ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/70039 Reviewed By: mruberry Differential Revision: D33166702 Pulled By: ngimel fbshipit-source-id: 33f3b80095c8562786a9a9b7a0e7eb58201af458	2021-12-17 09:28:44 -08:00
Nikita Shulga	92463573d8	Sanitize string before passing it as shell argument (#70070 ) Summary: Use `c10::printQuotedString` to escape any characters that might render string to be interpreted as more than one argument by shell script. Please note, that this codepath is deprecated and is not accessible by a typical PyTorch usage workflows. This issue was discovered by Daniel Lawrence of the Amazon Alexa team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70070 Reviewed By: suo Differential Revision: D33172721 Pulled By: malfet fbshipit-source-id: 9dbd17f6eb775aaa1a545da42cbc95864c1189ee	2021-12-17 08:08:28 -08:00
albanD	54406314cc	Update PULL_REQUEST_TEMPLATE.md (#70105 ) Summary: Many users actually send things like `Fixes #{69696}` which then fails to properly close the corresponding issue. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/70105 Reviewed By: ejguan Differential Revision: D33187501 Pulled By: albanD fbshipit-source-id: 2080ee42c30b9db45177f049627118a6c3b544b7	2021-12-17 07:53:36 -08:00
Joel Schlosser	b1d5948b34	Remove backward ops for miopen convolution (#69987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #69987 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33170379 Pulled By: jbschlosser fbshipit-source-id: 6bc274f1d457ec5bddc8b52c2f1c44eaae2ff0ed	2021-12-17 07:43:38 -08:00
Vasiliy Kuznetsov	f045618dab	dbr quant: extend qconfig_dict support to functionals, part 2 (#69766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69766 Follow-up on the previous PR, removes the requirement to have a parent qconfig in order for the object type qconfig to be applied for a function. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33020218 Pulled By: vkuzo fbshipit-source-id: fa0e10f05ca5f88b48ef74b9d2043ea763506742	2021-12-17 05:59:55 -08:00
Vasiliy Kuznetsov	a4173fc887	dbr quant: extend qconfig_dict support to functions, part 1 (#69758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69758 Extends DBR quant `qconfig_dict['object_type']` support to function types, with the restriction that a parent module must have a qconfig. A future PR will remove the restriction above (it is due to some technical debt), to keep PR sizes small. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33020217 Pulled By: vkuzo fbshipit-source-id: ce8a8185f9c87d437e1319ff6f19e8f6adf41e02	2021-12-17 05:59:52 -08:00
Vasiliy Kuznetsov	c186773d92	dbr quant: make fqn during prepare op hook required (#69726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69726 This is a cleanup, this variable was previously optional but it always exists, because the only way an op hook can run if there is a parent module with an `AutoQuantizationState` object. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D33003472 Pulled By: vkuzo fbshipit-source-id: de5769194808d42b025b848667815b4e3d73b6c6	2021-12-17 05:59:49 -08:00
Vasiliy Kuznetsov	b999f87503	fx quant: move _parent_name to common utils (#69720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720 This function is also useful for DBR quant, moving it from FX utils to common utils. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33003473 Pulled By: vkuzo fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623	2021-12-17 05:59:46 -08:00
Vasiliy Kuznetsov	4f450f44bf	dbr quant: initial support of qconfig_dict for modules (#69719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69719 This PR changes the API signature of DBR quant to use `qconfig_dict`, similar to FX graph mode quantization. In this first PR, only basic functionality is implemented: * qconfig=None or static quantization with quint8 only is tested * non-default qconfig for modules only is tested * targeting ops by order is not implemented Expanding this support will be done in future PRs. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33003475 Pulled By: vkuzo fbshipit-source-id: f5af81e29c34ea57c2e23333650e44e1758102e4	2021-12-17 05:59:44 -08:00
Vasiliy Kuznetsov	0f1ceb34ec	fx quant: refactor qconfig_dict utils to separate file (#69636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69636 Moves some of the qconfig_dict utilities away from the FX subdirectory into the quantization subdirectory. These utilities can be reused with other workflows. A future PR will start using these utilities in DBR quant. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Reviewed By: albanD Differential Revision: D33003474 Pulled By: vkuzo fbshipit-source-id: 34417b198681279469e6d7c43ea311180086d883	2021-12-17 05:58:25 -08:00
Hui Guo	7abb7667a6	[tensorexpr] Add memory planning to reuse intermediate buffers (#66452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31557188 Pulled By: huiguoo fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219	2021-12-17 01:38:02 -08:00
Hui Guo	ac92f7cc75	[tensorexpr] Remove the optional argument in LoopNest::prepareForCodeGen (#67144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67144 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881150 Pulled By: huiguoo fbshipit-source-id: af99087722ec71d6deb9049b63b573ae7720c9ec	2021-12-17 01:37:59 -08:00
Hui Guo	bbfd7b75ca	[tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881151 Pulled By: huiguoo fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe	2021-12-17 01:37:56 -08:00
Hui Guo	6075ec15b1	[tensorexpr] Add BufMap instruction to reuse the memory of dest buf for src buf (#66451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66451 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557190 Pulled By: huiguoo fbshipit-source-id: 96e08a05cb1c558706c4189e27d5d72efbd9c510	2021-12-17 01:37:53 -08:00
Hui Guo	c7e0951524	[tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557189 Pulled By: huiguoo fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030	2021-12-17 01:36:37 -08:00
Jerry Zhang	043098ef7f	[quant][graphmode] Rename backend_config_dict folder to backend (#69882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69882 att Test Plan: ``` python test/fx2trt/test_quant_trt.py ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33081761 fbshipit-source-id: c3178eec5798ac8587be09a963944b570c73e8ea	2021-12-16 21:13:04 -08:00
Kevin Tse	3d51c88032	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from MapDataPipes (#69561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69561 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952099 Pulled By: NivekT fbshipit-source-id: 95b725774a9d04d655e2542760726908f33043f4	2021-12-16 18:11:00 -08:00
Kevin Tse	b89c283c80	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from IterDataPipes (#69560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69560 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952100 Pulled By: NivekT fbshipit-source-id: e0cc31408c7cf3220fe274feed1c7202a1aaae70	2021-12-16 18:09:52 -08:00
anjali411	4a6a5d1630	OpInfos for torch.{flatten, column_stack} (#69237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69237 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32988956 Pulled By: anjali411 fbshipit-source-id: b7f5c537ff9731f56232aa5647910f03edf4582a	2021-12-16 17:50:58 -08:00
Jerry Zhang	ef6f776e82	[quant][be] Cleanup test cases for eager mode workflow (#69880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69880 Making the test cases more standardized, in general we would like to have ``` TestQuantizeEager, TestQuantizeEagerOps, TestQuantizeEagerModels, ``` but currently since we have separate ptq static, ptq dynamic and qat static apis, we only partially cleaned up the test cases, we can merge all of them later when we merge all the apis Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: supriyar Differential Revision: D33081418 fbshipit-source-id: fcb96559b76bbc51eb1b0625e0d4b193dbb37532	2021-12-16 17:47:30 -08:00
Wanchao Liang	92320dfe6e	[shard] remove set device for nccl (#69946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69946 This PR remove the implicit set_device for nccl pg according to the proposal of https://github.com/pytorch/pytorch/issues/69731 ghstack-source-id: 145847504 Test Plan: wait for ci Reviewed By: pritamdamania87 Differential Revision: D33099095 fbshipit-source-id: 3fe9f6a0facf5ea513c267e9f32c6a7fd56cc8a2	2021-12-16 17:16:42 -08:00
Jerry Zhang	9813629500	[reland][quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#70007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70007 This PR extends fusion pattern support from simple sequence of ops to a simple subgraph like conv - add ``` x - conv ---\ y ---------add ---- ouptut ``` where input x, y and output are observed/quantized Test Plan: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add ``` Imported from OSS Imported from OSS Reviewed By: supriyar Differential Revision: D33144605 fbshipit-source-id: 331fda77bdc431a8cd9abe1caea8347a71776ec2	2021-12-16 17:10:44 -08:00
Eli Uriegas	62809dc062	.github: Volume mount netrc to home directory (#70057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70057 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33169220 Pulled By: seemethere fbshipit-source-id: 720e5fb946249a26f0505afc34b95530258e53ea	2021-12-16 15:23:45 -08:00
Jerry Zhang	a73c6a45b6	[reland][quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#70006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70006 reland: fixing some mypy errors that was missed before This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one TODO: we can also move this to backend_config_dict folder Test Plan: regression fusion test ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Imported from OSS Reviewed By: supriyar Differential Revision: D33144606 fbshipit-source-id: ca34f282018a0fb4d04c7e35119eaf2d64258e78	2021-12-16 15:04:16 -08:00
Nikita Shulga	fa582045fc	Fix lint/mypy violations (#70059 ) Summary: Introduced by https://github.com/pytorch/pytorch/pull/69194 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/70059 Reviewed By: suo, cccclai Differential Revision: D33170748 Pulled By: malfet fbshipit-source-id: a2e42f37d04c21a735f6474e42eb6670d2a0c3b9	2021-12-16 14:06:27 -08:00
Michael Dagitses	02c63c3006	extract out c10 targets to the c10 package (#69992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69992 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33141013 fbshipit-source-id: e5edd6bd5b5834ac27390ba940ebed9148512c8d	2021-12-16 13:11:49 -08:00
Zhengxu Chen	d459e79500	[jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037 Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android. ghstack-source-id: 145818696 Test Plan: eyes. Reviewed By: qihqi, tugsbayasgalan Differential Revision: D32264616 fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a	2021-12-16 13:11:46 -08:00
Zhengxu Chen	39f65fee47	[jit] Split ClassType into a separate header. (#68036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68036 In Edge cases we want to separately include class_type.h because in the future we want to stop depending on the rest of the JIT types declared inside jit_type.h ghstack-source-id: 145818699 Test Plan: no behavior change. Reviewed By: qihqi, gmagogsfm Differential Revision: D32264618 fbshipit-source-id: 53dc187772e3dde88ff978b87252c31f3641860b	2021-12-16 13:10:05 -08:00
Ivan Yashchuk	243e135eb4	Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709 This PR adds support for triangular solver with a block CSR matrix. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33066067 Pulled By: cpuhrsch fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557	2021-12-16 13:03:42 -08:00
Rohit Gupta	5f3f327a9d	update `SequentialLR` signature (#69817 ) Summary: - ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~ - ~`verbose` isn't really used anywhere so removed it.~ updated missing docs and added a small check Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817 Reviewed By: ngimel Differential Revision: D33069589 Pulled By: albanD fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419	2021-12-16 12:58:00 -08:00
Joel Schlosser	15b9e5f8a4	Revert D33136054: Remove backward ops for miopen convolution Test Plan: revert-hammer Differential Revision: D33136054 (`8b9b819d22`) Original commit changeset: e049168732bd Original Phabricator Diff: D33136054 (`8b9b819d22`) fbshipit-source-id: 2a3cc3df3519d04595795f0bc87a807705d13a13	2021-12-16 12:46:02 -08:00
Pritam Damania	b199e3c842	Provide functionality to write custom ShardedTensor ops. (#69874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69874 We have a handful of ops supported for ShardedTensor via ``__torch_function__`` dispatch. However, we currently can't cover all torch operators and having a way for users to extend this functionality will make this functionality much more general. In this PR, I've introduced a custom_sharded_op decorator which can be used to register a custom sharded op implementation. ghstack-source-id: 145841141 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D33078587 fbshipit-source-id: 5936b7ac25582e613653c19afa559219719ee54b	2021-12-16 12:40:13 -08:00
Natalia Gimelshein	1f86e0ee2a	don't compile pow kernels for non-existent case (#70017 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/70017 Reviewed By: malfet Differential Revision: D33163747 Pulled By: ngimel fbshipit-source-id: 784c7934428ee896c637662fdd59833c3a395f64	2021-12-16 12:31:30 -08:00
Joel Schlosser	8b9b819d22	Remove backward ops for miopen convolution (#69987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33136054 Pulled By: jbschlosser fbshipit-source-id: e049168732bdfcf590ec8102412f2ef0418f9dcc	2021-12-16 11:49:49 -08:00
Jiawei Lv	b4c4a015d6	Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33163841 Original commit changeset: e262b6d8c80a Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8	2021-12-16 11:12:18 -08:00
Peter Bell	96fe82ac3c	HANDLE_TH_ERRORS: Move exception translation out of line (#69974 ) Summary: I've noticed that the `HANDLE_TH_ERRORS` macros are actually very expensive in terms of compile time. Moving the bulk of the catch statements out of line using a lippincott function significantly improves compile times and object file binary sizes. For just the generated autograd bindings, this halves serial build time from 8 minutes to 4 and binary size is more than halved for most files with the biggest difference being `python_variable_methods.cpp` which went from 126 MB to 43 MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69974 Reviewed By: mruberry Differential Revision: D33160899 Pulled By: albanD fbshipit-source-id: fc35fa86f69ffe5a0752557be30b438c8564e998	2021-12-16 11:04:48 -08:00
Natalia Gimelshein	9ff8c49ed9	Enable cpu scalar arguments for jiterator (#69861 ) Summary: Creates analog of `gpu_kernel_with_scalars` for jiterator kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/69861 Reviewed By: mruberry Differential Revision: D33134013 Pulled By: ngimel fbshipit-source-id: fd2412e8d6432e15d5721e95a194d29fa70ad92c	2021-12-16 10:58:59 -08:00
s-kumano	ff53ed24d2	fix NameError of docstring in broadcast_object_list (#69810 ) Summary: This PR fixes NameError of docstring in broadcast_object_list. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69810 Reviewed By: kimishpatel Differential Revision: D33143167 Pulled By: jbschlosser fbshipit-source-id: 99c076466ae4b4a332763b7546028c5097b417d7	2021-12-16 10:50:45 -08:00
Natalia Gimelshein	c9e898fef8	delete TH (#69929 ) Summary: Move TH<C>GenerateByteType includes into torch/csrc (the only place they are used), and we can remove TH folder altogether! The only thing left in THC are includes left for bc compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69929 Reviewed By: mruberry Differential Revision: D33133013 Pulled By: ngimel fbshipit-source-id: 78c87cf93d2d641631b0f71051ace318bf4ec3c1	2021-12-16 10:45:30 -08:00
zhouzaida	7f7966a888	[Docs] Fix the syntax of documentation (#69958 ) Summary: Fixes the syntax of documentation in the file torch/nn/utils/clip_grad.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/69958 Reviewed By: mruberry Differential Revision: D33160612 Pulled By: albanD fbshipit-source-id: 2dc199fee345bb4c75632900bc6f73a1ab8192a6	2021-12-16 10:38:39 -08:00
Taylor Robie	ebc66bfeea	[Profiler] Pull helper methods into dedicated file. (And start `torch/csrc/profiler` folder. (#69255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255 One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier. Test Plan: Unit tests Reviewed By: aaronenyeshi Differential Revision: D32671972 fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707	2021-12-16 10:33:47 -08:00
Chen Lai	b23890177f	[Operator Versioning][Edge] Codegen upgrader_mobile.cpp (#69194 ) Summary: From operator version map and upgrader torchscript, generate upgrader_mobile.cpp file. It also includes a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69194 ghstack-source-id: 145819351 Test Plan: ``` buck test mode/opt //caffe2/test:upgrader_codegen ``` ``` buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen ``` ``` python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` Reviewed By: iseeyuan Differential Revision: D32748985 fbshipit-source-id: f8437766edaba459bfc5e7fc7a3ca0520c4edb9a	2021-12-16 10:29:35 -08:00
Rohan Varma	c4281cc92d	Prototype checkpoint_wrapper (#69955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69955 Implements a checkpoint_wrapper function, which wraps nn.Module with checkpointing so user won't have to call checkpoint() everytime they want to checkpoint the module. Currently only support for reentrant-based checkpointing is added and only tested with FSDP to unblock a use case. Future work is to add support for new checkpointing API, add more tests, upstream to torch.utils.checkpoint. ghstack-source-id: 145811242 Test Plan: CI Reviewed By: mrshenli Differential Revision: D33107276 fbshipit-source-id: c4a1c68d71d65713a929994940a8750f73fbdbdb	2021-12-16 09:59:19 -08:00
Jiawei Lv	c80b5b8c8f	Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33102715 (`eb374de3f5`) Original commit changeset: 3816ff01c578 Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29	2021-12-16 09:39:57 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
francescocastelli	a6a1c709ff	Fixed libtorch at::Tensor::print() linking error (#69615 ) Summary: There was a declaration of function at::Tensor::print() in TensorBody.h, left there during the refactoring of Tensor and TensorBase (d701357d921ef167d42c125e65b6f7da6be3ad0f). Removing it from TensorBody.h resolve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69615 Test Plan: code below now compile and works fine (print `[CPUFloatType [3, 4, 5, 5, 5]] `) ``` #include <torch/torch.h> int main() { torch::Tensor tensor = torch::randn({3, 4, 5, 5, 5}); tensor.print(); } ``` Fixes https://github.com/pytorch/pytorch/issues/69515 Reviewed By: ngimel Differential Revision: D33020361 Pulled By: albanD fbshipit-source-id: 190f253fb4101a4205aede3574b6e8acd19e54a1	2021-12-16 07:57:10 -08:00
Xida Chen	531da0c43b	change asan test shard to 3 (#69843 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68261 This PR changes the number of test shard from 2-->3 for all Asan test, aiming to improve the run time for Asan tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69843 Reviewed By: janeyx99 Differential Revision: D33160771 Pulled By: xidachen fbshipit-source-id: dba1d318cc49b923e18704839471d8753cc00eca	2021-12-16 07:22:03 -08:00
Bin Bao	fe7b6446d5	[LTC] Upstream LazyTensor and LazyGraphExecutor (#69815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69815 Test Plan: Imported from OSS Reviewed By: dagitses, jbschlosser Differential Revision: D33059774 Pulled By: desertfire fbshipit-source-id: dd1e3e5f4fd3181517eebd2742f6a5b7b6fb9a7d	2021-12-16 05:44:40 -08:00
Bin Bao	28243769f9	[LTC] Upstream several internal ops (#69716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69716 To prepare for the landing of LazyTensor and LazyGraphExecutor, - arithmetic_ir_ops.h - cast.h - device_data.h - expand.h - generic.h - scalar.h Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32999410 Pulled By: desertfire fbshipit-source-id: 31559dd7a1e525591ae9e2d7f915ee864437c11f	2021-12-16 05:44:37 -08:00
Bin Bao	e6a4988b2d	[LTC] Upstream utils in computation_client (#69621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69621 Upstream the following utils - metrics.h - multi_wait.h - thread_pool.h - unique.h Test Plan: Imported from OSS Reviewed By: wconstab, VitalyFedyunin Differential Revision: D32957629 Pulled By: desertfire fbshipit-source-id: 5f2fb57493856556099b7cda7560a568d1f9ed97	2021-12-16 05:43:09 -08:00
Nicolas Hug	73a6c36f1b	Add more details to the known limitations section of torchhub docs (#69970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69970 This is a follow up to https://github.com/pytorch/hub/issues/243 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33124060 Pulled By: NicolasHug fbshipit-source-id: 298fe14b39a1aff3e0b029044c9a0db8bc82336a	2021-12-16 02:43:48 -08:00
Tristan Rice	eb374de3f5	Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923 Original commit changeset: fbaf2cc06ad4 Original Phabricator Diff: D32606547 (`e61fc1c03b`) This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck. Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor will add ciflow tags to ensure mac builds are fine Reviewed By: aivanou Differential Revision: D33102715 fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb	2021-12-15 22:51:43 -08:00
Junjie Wang (PyTorch)	5cc4037369	[PyTorch][Distributed] Integrate with ShardedOptimizer in the unit test of ShardedLinear (#69569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69569 Since ShardedOptimizer is added in https://github.com/pytorch/pytorch/pull/68607. We now integrate it in our unit test for Sharded Linear. ghstack-source-id: 145773749 Test Plan: CI + Unit test Reviewed By: wanchaol Differential Revision: D32777020 fbshipit-source-id: eb6b1bb0f6234976f024273833154cab274fed25	2021-12-15 17:55:01 -08:00
Junjie Wang (PyTorch)	dc18048dd8	[PT-D][Fix] Broken sharded embedding and embedding bag test fix (#69725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69725 We have added a `no_grad` cx manager in the tensor sharding to ensure that the local_shard is the root node. But it turns out for embedding and embedding_bag, when the `max_norm` is specified, it will complain for row-wise sharding. We use the original `max_norm` of the operators. Error traces: ``` File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/overrides.py", line 1389, in handle_torch_function result = torch_func_method(public_api, types, args, kwargs) File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/api.py", line 554, in __torch_function__ return sharded_embedding(types, args, kwargs, self._process_group) File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 115, in sharded_embedding return _handle_row_wise_sharding( File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 309, in _handle_row_wise_sharding gathered_input_embeddings = torch.nn.functional.embedding( File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/nn/functional.py", line 2153, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: A view was created in no_grad mode and its base or another view of its base has been modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). exiting process 2 with exit code: 10 ``` As a fix, we clone, detach the local shard from the narrow result without using the context manager. ghstack-source-id: 145773748 Test Plan: CI + Unit test. Reviewed By: pritamdamania87, wanchaol Differential Revision: D33000927 fbshipit-source-id: 4d5a93120675e90d4d6d6225a51c4a481d18d159	2021-12-15 17:53:49 -08:00
Joel Schlosser	4d5dd00e61	Remove backward ops for cuDNN transposed convolution (#69902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69902 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33093795 Pulled By: jbschlosser fbshipit-source-id: 8b90150bd1996e48c0c888bdab4e95a849d10ef5	2021-12-15 17:48:25 -08:00
Joel Schlosser	3dc3651e0e	Remove backward ops for cuDNN convolution (#69901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69901 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33093796 Pulled By: jbschlosser fbshipit-source-id: f5beab6f3078144b6c8e5c4c51d69823815a9f99	2021-12-15 17:46:49 -08:00
Gao, Xiang	bf15dc22bc	Fix build on latest main branch of thrust (#69985 ) Summary: Our internal CI that builds PyTorch with the latest main branch of thrust fails with ``` https://github.com/pytorch/pytorch/issues/22 466.9 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o.d -x cu -c ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o https://github.com/pytorch/pytorch/issues/22 466.9 ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu(53): error: namespace "thrust" has no member "make_constant_iterator" https://github.com/pytorch/pytorch/issues/22 466.9 https://github.com/pytorch/pytorch/issues/22 466.9 1 error detected in the compilation of "../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu". ``` The failure is because this file uses `thrust::make_counting_iterator`, but didn't include the file where this function is defined. cc: xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69985 Reviewed By: jbschlosser Differential Revision: D33135575 Pulled By: ngimel fbshipit-source-id: 7a8da56bba609d6c30de4a064669faba12cb7168	2021-12-15 17:08:43 -08:00
Charles David Hernandez	98c0fb8b42	[sparsity] More descriptive error message for missing parameters (#69895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69895 sparse.Linear has an error message that doesn't tell the user how to resolve the issue. This adds more info. ghstack-source-id: 145603212 Test Plan: Not needed -- string change only Reviewed By: jerryzh168 Differential Revision: D33039278 fbshipit-source-id: b5f7f5d257142eb3e7ad73f7c005755253a329d7	2021-12-15 16:58:31 -08:00
Rui Zhu	46ace4ac33	Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924 Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32819181 fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995	2021-12-15 16:44:15 -08:00
Jay Chae	32ffad17a9	[PyTorch][Easy] make GlobalRecordFunctionCallbacks smallvector (#70002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70002 callbacks are limited to 4. no reason for it to be `std::vector` Test Plan: CI Reviewed By: aaronenyeshi Differential Revision: D32611294 fbshipit-source-id: 21823248abe40d461579b9b68d53c8c0de2a133d	2021-12-15 16:28:09 -08:00
Jay Chae	65ab63310b	[PyTorch] use div instead of mul when calculating sampling probability (#70001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70001 multiplying inversion of `kLowProb` instead of division which uses less expensive `mul` instead of `idv` Test Plan: Before {F682076291} After {F682076323} Reviewed By: robieta Differential Revision: D32608440 fbshipit-source-id: 7851317a0f7e33813f2bd7a152e5e7f4b5c361b4	2021-12-15 15:28:18 -08:00
Scott Wolchok	66406ee0f7	[PyTorch][Static Runtime] Fix to() w/dtype bool (#69935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69935 Didn't realize that `AT_DISPATCH_ALL_TYPES` should really be called `AT_DISPATCH_MOST_TYPES`. ghstack-source-id: 145661358 Test Plan: Added test for dtype bool. Ran CMF local_ro net: before: ``` I1215 12:33:49.300174 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.966491. Iters per second: 1034.67 I1215 12:33:49.825570 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.94867. Iters per second: 1054.11 I1215 12:33:50.349246 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947926. Iters per second: 1054.93 I1215 12:33:50.870433 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.943779. Iters per second: 1059.57 I1215 12:33:51.393702 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947185. Iters per second: 1055.76 I1215 12:33:51.915666 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.945672. Iters per second: 1057.45 I1215 12:33:52.438475 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948407. Iters per second: 1054.4 I1215 12:33:52.965337 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95472. Iters per second: 1047.43 I1215 12:33:53.494563 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.967083. Iters per second: 1034.04 I1215 12:33:54.017879 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948945. Iters per second: 1053.8 I1215 12:33:54.017930 1606538 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.951888, standard deviation: 0.0083367 ``` after: ``` I1215 12:32:35.820874 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.999845. Iters per second: 1000.15 I1215 12:32:36.343147 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944363. Iters per second: 1058.91 I1215 12:32:36.863806 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.942542. Iters per second: 1060.96 I1215 12:32:37.385459 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944677. Iters per second: 1058.56 I1215 12:32:37.905436 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941135. Iters per second: 1062.55 I1215 12:32:38.424907 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.939748. Iters per second: 1064.11 I1215 12:32:38.944643 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941764. Iters per second: 1061.84 I1215 12:32:39.463791 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.938946. Iters per second: 1065.02 I1215 12:32:39.987567 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95437. Iters per second: 1047.81 I1215 12:32:40.511204 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.959139. Iters per second: 1042.6 I1215 12:32:40.511242 1594955 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.950653, standard deviation: 0.0184761 ``` Reviewed By: hlu1 Differential Revision: D33106675 fbshipit-source-id: 5bb581f8d0ed22ef08df1936dc8d67045e44e862	2021-12-15 15:26:56 -08:00
Eli Uriegas	b28a4100ff	scripts: Fix manylinux2014 promotion to pypi (#70003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70003 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser, janeyx99 Differential Revision: D33143730 Pulled By: seemethere fbshipit-source-id: 83a46047fbfe4709e841fbfcaa75e434ff325be5	2021-12-15 14:55:00 -08:00
Peter Bell	38cfacd817	Tensor: Define operators override functions in TensorBody.h (#68697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68697 Currently, if you include `Tensor.h` but not `TensorOperators.h` then using overloaded operators will compile but fail at link time. Instead, this defines the member functions in `TensorBody.h` and leaves `TensorOperators.h` as only the free functions. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596269 Pulled By: albanD fbshipit-source-id: 5ce39334dc3d505865268f5049b1e25bb90af44a	2021-12-15 14:29:38 -08:00
Peter Bell	9c7c1b769a	Functionalization: Only include headers for required ops (#68690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68690 RegisterFunctionalization.cpp is a shared file, so only including the required operators means a single operator change only requires 1 shard to be rebuilt instead of all of them. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596275 Pulled By: albanD fbshipit-source-id: 8b56f48872156b96fbc0a16b542b8bab76b73fd4	2021-12-15 14:29:35 -08:00
Peter Bell	7bb4b683b5	Codegen: Registration now only includes the functions used (#68689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68689 Currently Register{DispatchKey}.cpp includes all of `NativeFunctions.h`, so any operator signature change requires all backend registration to be recompiled. However, most backends only have registrations for a small fraction of operators so it makes sense to only include the specific functions required. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596273 Pulled By: albanD fbshipit-source-id: 11d511f47937fbd5ff9f677c9914277b5d015c25	2021-12-15 14:29:32 -08:00
Peter Bell	6ba18ba87e	Codegen: Generate static dispatch headers per operator (#68714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68714 This splits the static dispatch headers (e.g. `CPUFunctions.h`) into per operators headers (e.g. `ops/empty_cpu_dispatch.h`) which is needed for when `Tensor.h` is compiled with static dispatch enabled. There are also several places in ATen where the static dispatch headers are used as an optimization even in dynamic dispatch builds. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596265 Pulled By: albanD fbshipit-source-id: 287783ef4e35c7601e9d2714ddbc8d4a5b1fb9e5	2021-12-15 14:29:29 -08:00
Peter Bell	303d60b8da	Add TORCH_ASSERT_ONLY_METHOD_OPERATORS macro (#68688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68688 This adds a new macro `TORCH_ASSERT_ONLY_METHOD_OPERATORS` which allows `Tensor.h` to be included, but not headers which pull in all other operators. So, a file that defines this macro needs to use the fine-grained headers to include only the operators being used. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596267 Pulled By: albanD fbshipit-source-id: 6fc2ce3d2b0f52ac6d81b3f063193ce26e0d75a3	2021-12-15 14:29:26 -08:00
Peter Bell	bab61be43b	Codegen: Add root_name property to NativeFunction{,sGroup} (#68687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68687 This adds `NativeFunction.root_name` which is the canonical name for the operator group. i.e. the BaseOperatorName without inplace or double-underscores. In the previous PR I referred to this as `base_name` but confusingly `BaseOperatorName` does potentially include inplace or double-underscores. I also add the property to `NativeFunctionsGroup` so that grouped functions with type `Union[NativeFunction, NativeFunctionsGroup]` can have the property queried without needing `isinstance` checks. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596271 Pulled By: albanD fbshipit-source-id: 8b6dad806ec8d796dcd70fc664604670d668cae7	2021-12-15 14:28:10 -08:00
Michael Suo	a406a427ae	Revert D33004315: Support torch.equal for ShardedTensor. Test Plan: revert-hammer Differential Revision: D33004315 (`1c4c81622c`) Original commit changeset: 786fe26baf82 Original Phabricator Diff: D33004315 (`1c4c81622c`) fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e	2021-12-15 14:14:06 -08:00
Pritam Damania	1c4c81622c	Support torch.equal for ShardedTensor. (#69734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734 Added support for `torch.equal` to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. Will implement `allclose` in a follow PR. ghstack-source-id: 145301451 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D33004315 fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f	2021-12-15 13:07:36 -08:00
Sahan Chanuka Paliskara	8a08e70bf4	Revert D32596676: Avoid adding torch::deploy interpreter library to the data section Test Plan: revert-hammer Differential Revision: D32596676 (`986d19c0a7`) Original commit changeset: 1ab15b2d3642 Original Phabricator Diff: D32596676 (`986d19c0a7`) fbshipit-source-id: da4f02114fd7e41634f116ab659a55cd985cfd7d	2021-12-15 13:02:22 -08:00
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Peter Bell	587f8d9924	OperatorEntry: Avoid unnecessarily templated code (#67986 ) Summary: `assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986 Reviewed By: jbschlosser Differential Revision: D33108600 Pulled By: swolchok fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346	2021-12-15 12:43:53 -08:00
Sahan Chanuka Paliskara	986d19c0a7	Avoid adding torch::deploy interpreter library to the data section (#69245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69245 Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt. Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy check the size of the .data section Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB ``` (pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy There are 55 section headers, starting at offset 0x24bac82b0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000200350 00000350 0000000000000028 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000200378 00000378 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000200398 00000398 0000000000000024 0000000000000000 A 0 0 4 [ 4] .dynsym DYNSYM 00000000002003c0 000003c0 0000000000d07a48 0000000000000018 A 9 1 8 [ 5] .gnu.version VERSYM 0000000000f07e08 00d07e08 0000000000115f86 0000000000000002 A 4 0 2 [ 6] .gnu.version_r VERNEED 000000000101dd90 00e1dd90 0000000000000510 0000000000000000 A 9 15 4 [ 7] .gnu.hash GNU_HASH 000000000101e2a0 00e1e2a0 00000000003b4fb0 0000000000000000 A 4 0 8 [ 8] .hash HASH 00000000013d3250 011d3250 0000000000457e20 0000000000000004 A 4 0 4 [ 9] .dynstr STRTAB 000000000182b070 0162b070 0000000004ef205a 0000000000000000 A 0 0 1 [10] .rela.dyn RELA 000000000671d0d0 0651d0d0 0000000000110b80 0000000000000018 A 4 0 8 [11] .rela.plt RELA 000000000682dc50 0662dc50 00000000000093f0 0000000000000018 A 4 35 8 [12] .rodata PROGBITS 0000000006837040 06637040 00000000034067a8 0000000000000000 AMS 0 0 64 [13] fb_build_info PROGBITS 0000000009c3d7f0 09a3d7f0 00000000000002ee 0000000000000000 A 0 0 16 [14] .gcc_except_table PROGBITS 0000000009c3dae0 09a3dae0 00000000014a9340 0000000000000000 A 0 0 4 [15] .eh_frame_hdr PROGBITS 000000000b0e6e20 0aee6e20 00000000004abf54 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 000000000b592d78 0b392d78 000000000200e344 0000000000000000 A 0 0 8 [17] .text PROGBITS 000000000d5a2000 0d3a2000 000000001e55944e 0000000000000000 AX 0 0 256 [18] .init PROGBITS 000000002bafb450 2b8fb450 0000000000000017 0000000000000000 AX 0 0 4 [19] .fini PROGBITS 000000002bafb468 2b8fb468 0000000000000009 0000000000000000 AX 0 0 4 [20] .never_hugify PROGBITS 000000002bafb480 2b8fb480 0000000000000db3 0000000000000000 AX 0 0 16 [21] text_env PROGBITS 000000002bafc240 2b8fc240 0000000000002e28 0000000000000000 AX 0 0 16 [22] .plt PROGBITS 000000002baff070 2b8ff070 00000000000062b0 0000000000000000 AX 0 0 16 [23] .tdata PROGBITS 000000002bb06000 2b906000 0000000000000b20 0000000000000000 WAT 0 0 8 [24] .tbss NOBITS 000000002bb06b40 2b906b20 0000000000007cb8 0000000000000000 WAT 0 0 64 [25] .fini_array FINI_ARRAY 000000002bb06b20 2b906b20 0000000000000028 0000000000000000 WA 0 0 8 [26] .init_array INIT_ARRAY 000000002bb06b48 2b906b48 0000000000008878 0000000000000000 WA 0 0 8 [27] .data.rel.ro PROGBITS 000000002bb0f3c0 2b90f3c0 0000000000029ce0 0000000000000000 WA 0 0 64 [28] .ctors PROGBITS 000000002bb390a0 2b9390a0 0000000000000010 0000000000000000 WA 0 0 8 [29] .dynamic DYNAMIC 000000002bb390b0 2b9390b0 0000000000000340 0000000000000010 WA 9 0 8 [30] .got PROGBITS 000000002bb393f0 2b9393f0 000000000001f040 0000000000000000 WA 0 0 8 [31] .bss.rel.ro NOBITS 000000002bb58440 2b958430 0000000000000c40 0000000000000000 WA 0 0 32 [32] .data PROGBITS 000000002bb5a000 2b959000 0000000000194188 0000000000000000 WA 0 0 4096 [33] .tm_clone_table PROGBITS 000000002bcee188 2baed188 0000000000000000 0000000000000000 WA 0 0 8 [34] .probes PROGBITS 000000002bcee188 2baed188 0000000000000002 0000000000000000 WA 0 0 2 [35] .got.plt PROGBITS 000000002bcee190 2baed190 0000000000003168 0000000000000000 WA 0 0 8 [36] .bss NOBITS 000000002bcf1300 2baf02f8 00000000005214f0 0000000000000000 WA 0 0 128 [37] .nvFatBinSegment PROGBITS 000000002c213000 2baf1000 0000000000002850 0000000000000000 A 0 0 8 [38] .nv_fatbin PROGBITS 000000002c216000 2baf4000 0000000052baed38 0000000000000000 WA 0 0 8 [39] .comment PROGBITS 0000000000000000 7e6a2d38 00000000000001dc 0000000000000000 MS 0 0 1 [40] .debug_aranges PROGBITS 0000000000000000 7e6a2f20 0000000001266c00 0000000000000000 0 0 16 [41] .debug_info PROGBITS 0000000000000000 7f909b20 000000007b21de49 0000000000000000 0 0 1 [42] .debug_abbrev PROGBITS 0000000000000000 fab27969 000000000179f365 0000000000000000 0 0 1 [43] .debug_line PROGBITS 0000000000000000 fc2c6cce 00000000176954ac 0000000000000000 0 0 1 [44] .debug_str PROGBITS 0000000000000000 11395c17a 0000000039dc32b0 0000000000000001 MS 0 0 1 [45] .debug_ranges PROGBITS 0000000000000000 14d71f430 0000000026a2d930 0000000000000000 0 0 16 [46] .debug_types PROGBITS 0000000000000000 17414cd60 000000000b211ff5 0000000000000000 0 0 1 [47] .debug_loc PROGBITS 0000000000000000 17f35ed55 000000009ca80c7e 0000000000000000 0 0 1 [48] .debug_macinfo PROGBITS 0000000000000000 21bddf9d3 000000000000151c 0000000000000000 0 0 1 [49] .note.stapsdt NOTE 0000000000000000 21bde0ef0 0000000000001b3c 0000000000000000 0 0 4 [50] .debug_macro PROGBITS 0000000000000000 21bde2a2c 0000000000040e6a 0000000000000000 0 0 1 [51] .torch_deploy_pay PROGBITS 0000000000000000 21be23896 0000000026ba5d28 0000000000000000 0 0 1 [52] .symtab SYMTAB 0000000000000000 2429c95c0 00000000020ce0c8 0000000000000018 54 863985 8 [53] .shstrtab STRTAB 0000000000000000 244a97688 000000000000025c 0000000000000000 0 0 1 [54] .strtab STRTAB 0000000000000000 244a978e4 00000000070309c6 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` Reviewed By: shunting314 Differential Revision: D32596676 fbshipit-source-id: 1ab15b2d36422506d8f781d3bbc0c70c44bc3d91	2021-12-15 11:27:57 -08:00
Scott Wolchok	c6bcfb152d	[PyTorch][easy] Move GlobalRecordFunctionCallbacks{,Entry} to cpp file (#68483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483 Doesn't need to be in the header. ghstack-source-id: 145668417 Test Plan: CI Reviewed By: chaekit Differential Revision: D32477113 fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d	2021-12-15 09:38:51 -08:00
Mike Iovine	873585da2b	[SR] Improve set_inputs (#69087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087 This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`: 1. Eliminate code duplication between rvalue/lvalue overloads 2. Add type checks 3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed. 4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.\| ghstack-source-id: 145599837 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32711705 fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b	2021-12-15 09:31:19 -08:00
Scott Wolchok	aeedd89d4e	[PyTorch] RecordFunction: use SmallVector for ObserverContextList (#68412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412 These lists have the same size as CallbackHandles, so they should be the same container type. ghstack-source-id: 145668416 Test Plan: Run same command as previous diff. Before: see previous diff, average about 0.46us After: P467928077, average about 0.43us Reviewed By: chaekit Differential Revision: D32454856 fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56	2021-12-15 09:31:16 -08:00
Jane Xu	29914f55bf	Skip print_test_stats checks for tests that use repeat_test_for_types (#69872 ) Summary: Once https://github.com/pytorch/pytorch/issues/69865 is fixed, this change should be undone. This will avoid print_test_stats errors in CI, such as https://github.com/pytorch/pytorch/runs/4501145212?check_suite_focus=true (HUD view `fc37e5b3ed`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69872 Reviewed By: dagitses, suo Differential Revision: D33094446 Pulled By: janeyx99 fbshipit-source-id: 7378556d75ea94dd407a2bf9dda37b15c57014f7	2021-12-15 09:29:58 -08:00
Nikita Shulga	d71b8e1a8d	More distutils.version.LooseVersion changes (#69947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69947 Reviewed By: seemethere Differential Revision: D33111996 Pulled By: malfet fbshipit-source-id: e7d2cc4ed3e39452e809965e360b05f0b409ec0d	2021-12-15 08:07:36 -08:00
Alban Desmaison	6f9844693f	Revert D32974907: [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops Test Plan: revert-hammer Differential Revision: D32974907 (`bf089840ac`) Original commit changeset: ba205e74b566 Original Phabricator Diff: D32974907 (`bf089840ac`) fbshipit-source-id: e47838f3008ba014d884aef53460df654f0cf731	2021-12-15 05:46:49 -08:00
Alban Desmaison	87bc1f4ed8	Revert D33024528: [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict Test Plan: revert-hammer Differential Revision: D33024528 (`59000cff91`) Original commit changeset: 5c770c82c8f6 Original Phabricator Diff: D33024528 (`59000cff91`) fbshipit-source-id: 7da6f421ef63f47fbffad8b3ad91f6a31d19d867	2021-12-15 05:45:29 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	43b8e833e9	Fix bug in aten::full signature in version_map.h to accurately reflect the current schema (#69860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860 Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full. Test Plan: None Reviewed By: gmagogsfm Differential Revision: D33066985 fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14	2021-12-15 01:09:31 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	5c7817fd43	Add test operator in upgrader entry (#69427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D32867984 Pulled By: tugsbayasgalan fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c	2021-12-15 00:40:05 -08:00
soulitzer	47f11730ec	Add testing for forward over reverse gradgrad (#69740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69740 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031727 Pulled By: soulitzer fbshipit-source-id: 2bcba422b4bcea3bbc936d07ba45171a6531e578	2021-12-14 23:35:10 -08:00
soulitzer	d0fe7db1f6	Add formulas for distributions (#69690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69690 * #69558 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031726 Pulled By: soulitzer fbshipit-source-id: 9ae461dc6043d48d5bb8c2bbaa266d06ad99f317	2021-12-14 23:35:07 -08:00
soulitzer	b399a4d7b9	Add some reduction forward AD formulas (#69661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020601 Pulled By: soulitzer fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e	2021-12-14 23:33:43 -08:00
Scott Wolchok	3b7fc0243c	[PyTorch] Make TypePrinter take const Type& (#69412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412 TypePrinter does not need to take ownership of the Type. This helps unblock the following diff to stop refcounting Type singletons. ghstack-source-id: 145671619 Test Plan: CI Reviewed By: suo Differential Revision: D32858525 fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d	2021-12-14 23:13:03 -08:00
CodemodService FBSourceBuckFormatLinterBot	7a12b5063e	[AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily `arc lint --take BUCKFORMAT` Reviewed By: zertosh Differential Revision: D33119794 fbshipit-source-id: ca327caf34560c0bba32511e57d5dc18b71bdfe1	2021-12-14 21:54:41 -08:00
Jerry Zhang	59000cff91	[quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#69778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778 This PR extends fusion pattern support from simple sequence of ops to a simple subgraph like conv - add ``` x - conv ---\ y ---------add ---- ouptut ``` where input x, y and output are observed/quantized Test Plan: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33024528 fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef	2021-12-14 20:46:01 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Jerry Zhang	bf089840ac	[quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#69658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658 This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one TODO: we can also move this to backend_config_dict folder Test Plan: regression fusion test ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32974907 fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1	2021-12-14 19:04:21 -08:00
Mike Iovine	102684b252	[SR] Fix stack/concat bug (#68777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777 Fixed some cases where negative dimensions were not handled correctly * `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547). * concat also needs to wrap its dim Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Added new tests to cover this case Reviewed By: hlu1 Differential Revision: D32604623 fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf	2021-12-14 16:26:27 -08:00
David Berard	ebc35a7ead	[JIT] Enable freezing for sparse COO tensors (#69614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614 Previously sparse COO tensors were ignored during freezing, because `tryInsertConstant` would fail during `freeze_module.cpp`, and because hashes weren't implemented for COO tensor IValues. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32954620 Pulled By: davidberard98 fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831	2021-12-14 15:43:50 -08:00
Brian Hirsh	33363cea64	Revert D32498572: allow external backend codegen to be used without autograd kernels Test Plan: revert-hammer Differential Revision: D32498572 (`b83b6f7424`) Original commit changeset: 3e7159c633f6 Original Phabricator Diff: D32498572 (`b83b6f7424`) fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535	2021-12-14 15:28:49 -08:00
Brian Hirsh	f6cad53443	Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels Test Plan: revert-hammer Differential Revision: D32498569 (`aa0cf68c17`) Original commit changeset: ebd932d042b9 Original Phabricator Diff: D32498569 (`aa0cf68c17`) fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d	2021-12-14 15:27:24 -08:00
Brian Hirsh	0ef523633f	Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen Test Plan: revert-hammer Differential Revision: D32498570 (`2e7a91c45f`) Original commit changeset: 0ce6a5614417 Original Phabricator Diff: D32498570 (`2e7a91c45f`) fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289	2021-12-14 15:04:10 -08:00
Nikita Shulga	24ee1d13f6	Another attempt to fix version comparison check (#69939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939 Reviewed By: atalman Differential Revision: D33108135 Pulled By: malfet fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c	2021-12-14 14:54:15 -08:00
Mike Guo	d4f8313497	Add low level torch.profiler.kineto_profile base class (#63302 ) Summary: Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper. The PR include the following change: 1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile. 2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler. 3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler. 4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction. With the refactoring, the responsibility boundary is clear and the new logic is simple to understand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302 Reviewed By: albanD Differential Revision: D33006442 Pulled By: robieta fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690	2021-12-14 14:47:43 -08:00
kshitij12345	e8d5c7cf7f	[nn] mha : no-batch-dim support (python) (#67176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 * [x] Update docs * [x] Tests for shape checking Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests. ``` pytest test/test_modules.py -k _multih --durations=20 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 372 items / 336 deselected / 36 selected test/test_modules.py ..............ssssssss.............. [100%] ================================================================================================ warnings summary ================================================================================================ ../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73 test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ============================================================================================== slowest 20 durations ============================================================================================== 8.66s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64 2.02s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64 1.89s call test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64 1.01s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 0.51s call test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64 0.46s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32 0.45s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64 0.44s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32 0.18s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64 0.17s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32 0.16s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64 0.11s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64 ============================================================================================ short test summary info ============================================================================================= =========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s =========================================================================== ``` cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176 Reviewed By: dagitses Differential Revision: D33094285 Pulled By: jbschlosser fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d	2021-12-14 13:21:21 -08:00
Shirong Wu	37ec99c0e4	Open source trt lowering workflow (#69381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381 Open source lowering workflow, related tools and tests. Test Plan: CI Reviewed By: 842974287 Differential Revision: D32815136 fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3	2021-12-14 13:00:21 -08:00
Nikita Shulga	930067d129	Build clang builds with -Werror (#69712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69712 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997002 Pulled By: malfet fbshipit-source-id: 8ebb5a955f8ae2d3fb67bc70636a2b1d66010c84	2021-12-14 12:41:57 -08:00
hwangdeyu	c76c6e9bd3	[ONNX] Add BFloat16 type support when export to ONNX (#66788 ) Summary: - PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model. - Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788 Reviewed By: jansel Differential Revision: D32283510 Pulled By: malfet fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a	2021-12-14 12:23:32 -08:00
Wanchao Liang	800a457b6f	[shard] add ShardedOptimizer (#68607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607 This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor The state_dict support will be a follow up diff ghstack-source-id: 145532834 Test Plan: python test_sharded_optim.py Reviewed By: pritamdamania87 Differential Revision: D32539994 fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743	2021-12-14 12:15:20 -08:00
Brian Hirsh	457ba1dd3e	Porting index_add to structured kernels, add an out variant (#65993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993 This PR attempts to port `index_add` to structured kernels, but does more than that: * Adds an `out=` variant to `index_add` * Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`. * Changes in `derivatives.yaml` file for autograd functioning * Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615 Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this) ~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~ Issue tracker: https://github.com/pytorch/pytorch/issues/55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32646426 fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5	2021-12-14 11:57:13 -08:00
Brian Hirsh	9594a94d80	fix CompositeImplicitAutograd ops improperly labeled (#69863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69863 This reverts commit 41c344d460a941c57f4793690c396f830a992824. Test Plan: Imported from OSS Reviewed By: albanD, soulitzer Differential Revision: D33072958 Pulled By: bdhirsh fbshipit-source-id: 3d3488f37986256986ab009d6f16476f29cff625	2021-12-14 11:47:07 -08:00
Nikita Shulga	269e92669a	[c2] Remove unused private fields (#69709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709 Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997006 Pulled By: malfet fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c	2021-12-14 11:31:08 -08:00
Nikita Shulga	fef9981998	Update run_test.py (#69920 ) Summary: Do not compare LooseVersion against string Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920 Reviewed By: atalman Differential Revision: D33101166 Pulled By: malfet fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5	2021-12-14 11:26:56 -08:00
Andrew Or	3e43c478a8	[Quant][fx] Lower reference conv[1-3]d module (#69228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69228 Implement lowering logic for reference conv modules, similar to https://github.com/pytorch/pytorch/pull/65723. ghstack-source-id: 145058198 Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_lowering Imported from OSS Reviewed By: anjali411 Differential Revision: D32890743 fbshipit-source-id: 04f2500628c60b0fbc84d22705164215e190aeba	2021-12-14 11:23:39 -08:00
Kevin Tse	b67eaec853	[DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862 Fixes #69445 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan, ngimel Differential Revision: D33068792 Pulled By: NivekT fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969	2021-12-14 11:18:26 -08:00
Peter Bell	1188d89a1d	TestMathBits: Call functions with original sample input values (#68947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947 `_test_math_view` currently calls the operator with different values than those specified in the `SampleInput`. This is undesirable as it could break mathematical properties required by the operator. Instead, this calls `math_op_view(math_op_physical(sample.input))` to get a view that represents the same value as the original input. `test_neg_view` already did this by returning `torch._neg_view(-x)` from `math_op_view` but this moves the handling into `_test_math_view` to make it apply to all view op tests. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064327 Pulled By: anjali411 fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8	2021-12-14 11:10:13 -08:00
Rui Zhu	1a299d8f1b	Add support for transformer layout of masked_softmax (#69272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272 In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D). This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory. In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread. This new layout is not currently supported in CPU yet. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32605557 fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2	2021-12-14 10:51:58 -08:00
Brian Hirsh	2e7a91c45f	make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498570 Pulled By: bdhirsh fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0	2021-12-14 10:25:04 -08:00
Brian Hirsh	aa0cf68c17	allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498569 Pulled By: bdhirsh fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04	2021-12-14 10:25:02 -08:00
Brian Hirsh	b83b6f7424	allow external backend codegen to be used without autograd kernels (#68529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32498572 Pulled By: bdhirsh fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca	2021-12-14 10:23:12 -08:00
Rick Weyrauch	8acd0a8b2f	Allow row sizes to support int64/size_t. (#69303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792 Follow up to D32715453 (`e60fd10659`), allowing row size to be 64-bit. Test Plan: buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt //caffe2/test: Reviewed By: jspark1105, jianyuh Differential Revision: D32768838 fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a	2021-12-14 10:09:08 -08:00
francescocastelli	2c9dd886af	Modify torch.movedim to handle scalar as no-op (#69537 ) Summary: `torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537 Test Plan: This code now works fine and res1 is a view of tensor ``` import torch tensor = torch.rand(torch.Size([])) res1 = torch.movedim(tensor, 0, 0) ``` Fixes https://github.com/pytorch/pytorch/issues/69432 Reviewed By: jbschlosser Differential Revision: D33020014 Pulled By: albanD fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007	2021-12-14 09:55:59 -08:00
Ivan Kobzarev	7503ec58b2	[nnc][fix] xnnpack ifdef (#69870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69870 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33075061 Pulled By: IvanKobzarev fbshipit-source-id: dd53ad8b7d0ff36a68f0864540d6f7dd2284f0e0	2021-12-14 09:50:24 -08:00
Donald Dong	f7294cd865	[Static Runtime] Skip ReplaceWithCopy when inputs have writters (#69819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819 We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them. Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB. Test Plan: - Added test cases, including a one that is problematic before this diff - CI Reviewed By: mikeiovine Differential Revision: D33052562 fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7	2021-12-14 09:39:49 -08:00
Nikita Shulga	07767569c9	Properly import LooseVersion (#69904 ) Summary: This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040 Somehow importing `distutils` from `setuptool` caused import of `distutils.versions`, which is not a documented dependency and got change with the release of [setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0) We should not rely on that, as `import distutils` never re-imports `distutils.version`, which one can see by observing https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py or by running: ``` % python3 -c "import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys'] % python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version'] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904 Reviewed By: albanD, atalman, janeyx99 Differential Revision: D33094453 Pulled By: malfet fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97	2021-12-14 09:28:19 -08:00
John Muradeli	fdcb78df38	`print` fix in `lr_scheduler` (#68338 ) Summary: `{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338 Reviewed By: jbschlosser Differential Revision: D33063970 Pulled By: albanD fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884	2021-12-14 09:05:19 -08:00
CodemodService FBSourceClangFormatLinterBot	f7210f8d90	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33090919 fbshipit-source-id: 78efa486776014a27f280a01a21f9e0af6742e3e	2021-12-14 08:06:58 -08:00
Jane Xu	4f81b2adbb	Remove if conditioning from some MacOS workflow steps (#69788 ) Summary: Indirectly fixes https://github.com/pytorch/pytorch/issues/69389 These steps shouldn't error out when the credentials aren't set anyway Pull Request resolved: https://github.com/pytorch/pytorch/pull/69788 Reviewed By: seemethere Differential Revision: D33061307 Pulled By: janeyx99 fbshipit-source-id: 7db6d15b3e80c3c13ea428248a8b4f8d2d32d4a1	2021-12-14 07:54:15 -08:00
Aditya Tewary	fa615b332d	added set_printoptions examples (#68324 ) Summary: Added examples for `torch.set_printoptions` ``` >>> torch.set_printoptions(precision=2) >>> torch.tensor([1.12345]) tensor([1.12]) >>> torch.set_printoptions(threshold=5) >>> torch.arange(10) tensor([0, 1, 2, ..., 7, 8, 9]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68324 Reviewed By: ngimel Differential Revision: D33063869 Pulled By: anjali411 fbshipit-source-id: 24db99df1419f96ba8ae2b5217cb039b288b630a	2021-12-14 07:40:52 -08:00
Vitaly Fedyunin	d90012689f	[DataPipe] Control shuffle settings from DataLoader2 (#65756 ) Summary: Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756 Reviewed By: albanD Differential Revision: D31344867 Pulled By: VitalyFedyunin fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347	2021-12-14 07:35:26 -08:00
Richard Zou	620a1fcb55	OpInfos for: normal, bernoulli, multinomial (#66358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66358 Test Plan: - run tests Reviewed By: mruberry Differential Revision: D31551695 Pulled By: zou3519 fbshipit-source-id: cf1b43118a0414a1af9ece9ae8c0598b2701aa0a	2021-12-14 06:59:38 -08:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Alban Desmaison	badf7b0210	fix typo changing the generated code (#69899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69899 Reviewed By: soulitzer Differential Revision: D33093461 Pulled By: albanD fbshipit-source-id: 2c672a2b767f0caed1ef3a1d2afa1cacdfcdc320	2021-12-14 06:36:14 -08:00
soulitzer	51033ec840	Add forward AD layout check for storage numel (#68631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631 This PR: - Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor. - Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap. - Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?) For functorch: - we need to copy the batching rule implemented here Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32899678 Pulled By: soulitzer fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794	2021-12-14 04:34:25 -08:00
soulitzer	6078e12ad6	Add forward AD support for as_strided (#68629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68629 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32899680 Pulled By: soulitzer fbshipit-source-id: b80ba4483c06108938923f17dc67278b854515ef	2021-12-14 04:33:05 -08:00
jjsjann123	fed9b90ed4	fixing removeProfilingNodes duplicated functions (#1282 ) (#68804 ) Summary: Unfortunately there're two versions of removeProfilingNodes function and one of them is not cleaning up profile_ivalue nodes properly. This leads to a dangling profile_ivalue node, which ended up being profiled multiple times and could give us false assert failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68804 Reviewed By: mrshenli Differential Revision: D32980157 Pulled By: Krovatkin fbshipit-source-id: cd57c58a941d10ccd01a6cd37aac5c16256aaea6	2021-12-13 22:54:30 -08:00
Shirong Wu	82075c0a19	Create trt plugin base (#69487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69487 Write customized plugin for trt requires extend IPluginV2IOExt. This diff extract functions that should share comon impl between plugins from IPluginV2IOExt into plugin_base, make writing customized plugin for oss user easier. This diff also fix double creator issue, the root cause is about get_trt_plugin in converters.py look for plugin by name matching. Swith to use the util function from converters_utils.py resolve the issue. Test Plan: CI Reviewed By: 842974287 Differential Revision: D32747052 fbshipit-source-id: 7f2e8811c158230f66a0c389af4b84deaf7e2d1f	2021-12-13 21:31:24 -08:00
Andrey Talman	77a4b89411	Adding windows cuda 11.5 workflows (#69377 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69377 Reviewed By: ngimel Differential Revision: D33076022 Pulled By: atalman fbshipit-source-id: aeb2791fc15d7b491976f57a74c1989c6ca61b81	2021-12-13 20:49:02 -08:00
Supriya Rao	b1ef56d646	[quant][docs] quantized model save/load instructions (#69789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69789 Add details on how to save and load quantized models without hitting errors Test Plan: CI autogenerated docs Imported from OSS Reviewed By: jerryzh168 Differential Revision: D33030991 fbshipit-source-id: 8ec4610ae6d5bcbdd3c5e3bb725f2b06af960d52	2021-12-13 20:23:59 -08:00
Erjia Guan	2b81ea4f9a	[DataPipe] Export ShardingFilter (#69844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69844 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33062183 Pulled By: ejguan fbshipit-source-id: 6b3f4ad376959c4d2e8c8b2751ae6657527dcd36	2021-12-13 19:30:56 -08:00
Adrian Wälchli	603a1de871	Fix inefficient recursive update in ShardedTensor.state_dict hook (#68806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68805 The bug is described in the linked issue. This PR is an attempt to make the functions `_recurse_update_dict` and `_recurse_update_module` more efficient in how they iterate over the submodules. The previous implementation was suboptimal, as it recursively called the update method on the submodules returned by `module.named_modules()`, while `module.named_modules()` already returned all submodules including nested ones. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68806 Reviewed By: pritamdamania87 Differential Revision: D33053940 Pulled By: wanchaol fbshipit-source-id: 3e72822f65a641939fec40daef29c806af725df6	2021-12-13 19:22:55 -08:00
Peter Bell	b08d64202a	Remove THGeneral (#69041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041 `TH_CONCAT_{N}` is still being used by THP so I've moved that into it's own header but all the compiled code is gone. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872477 Pulled By: ngimel fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a	2021-12-13 16:14:28 -08:00
Jithun Nair	8dfdc3df82	[ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706 ) Summary: Remove all hardcoded AMD gfx targets PyTorch build and Magma build will use rocm_agent_enumerator as backup if PYTORCH_ROCM_ARCH env var is not defined PyTorch extensions will use same gfx targets as the PyTorch build, unless PYTORCH_ROCM_ARCH env var is defined torch.cuda.get_arch_list() now works for ROCm builds PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now. PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906" cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706 Reviewed By: seemethere Differential Revision: D32735862 Pulled By: malfet fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d	2021-12-13 15:41:40 -08:00
Mike Iovine	c6c3b43498	[SR][easy] Accessors for value array offsets (#69755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69755 Per swolchok's suggestion on D32609915 (`1c43b1602c`). Hide the value offset indices behind accessors to provide more flexibility if we ever decide to change the layout of the values array. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32838145 fbshipit-source-id: cf805c077672de4c2fded9b41da01eca6d84b388	2021-12-13 15:31:39 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Joel Schlosser	fc37e5b3ed	Hook up general convolution to convolution_backward (#69584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69584 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32936380 Pulled By: jbschlosser fbshipit-source-id: c6fdd88db33bd1a9d0eabea47ae09a4d5b170e92	2021-12-12 17:30:01 -08:00
Hao Lu	0420de3539	[SR] Log SR options (#69809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69809 SR options is only printed out once per model per net. Logging it is actually pretty helpful for debugging. Test Plan: CI Reviewed By: donaldong Differential Revision: D33046814 fbshipit-source-id: 536b34e00fbc8a273c5eb4d8ae5caca0dc1f4c24	2021-12-12 16:32:00 -08:00
Joel Schlosser	f0e98dcbd3	General convolution_backward function (#69044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69044 Test Plan: Imported from OSS Reviewed By: zou3519, albanD, H-Huang Differential Revision: D32708818 Pulled By: jbschlosser fbshipit-source-id: e563baa3197811d8d51553fc83718ace2f8d1b7a	2021-12-12 15:53:38 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	a5b5152d7a	Fix typo in aten::full in version_map (#69807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69807 Test Plan: {gif:ursvp75m} Reviewed By: gmagogsfm Differential Revision: D33044503 fbshipit-source-id: 14aac66b123d84ca3f35f02c276b15e55015df9e	2021-12-12 14:47:16 -08:00
soulitzer	af7ee9fc01	Forward AD for inplace comparison operators (#69597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69597 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020600 Pulled By: soulitzer fbshipit-source-id: 0c9ab210f7dc952a41fbcaa1f5f7921c2fdeb18b	2021-12-12 00:11:14 -08:00
soulitzer	0dcbd73eee	Add some forward AD formulas (#69384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69384 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020602 Pulled By: soulitzer fbshipit-source-id: a92dd243f2b5b21fe277b0bb17bcd61dfe5a0d67	2021-12-12 00:11:11 -08:00
soulitzer	baf92f9d5a	Fix copy_ forward AD to handle broadcasting (#69592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69592 Currently, forward AD function for`copy_` (in `VariableTypeManual`) does not handle the broadcasting case. ~EDIT: but that is not a design decision, not a bug. In this PR, we make that clear as a comment.~ Note: `broadcast_to` does not have a batching rule in core, so the ops that rely on `copy_` to broadcast will still fail batched forward grad computation. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020603 Pulled By: soulitzer fbshipit-source-id: 09cb702bffc74061964a9c05cfef5121f8164814	2021-12-12 00:11:08 -08:00
soulitzer	db32daf4b2	Do not test batched forward grad for inplace ops (#69558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69558 Currently we skip batched forward grad checks completely for certain views that also have inplace variants. This PR allow us to decouple the check. Alternative: just skip the batched forward checks for inplace ops entirely. I'm okay with this because it was surprising to me these checks are being run in the first place. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020599 Pulled By: soulitzer fbshipit-source-id: f8012aadc0e775f80da0ab62b2c11f6645bb1f51	2021-12-12 00:09:45 -08:00
Michael Suo	f565167fbd	Revert D32606547: torch/monitor: add C++ events and handlers Test Plan: revert-hammer Differential Revision: D32606547 (`e61fc1c03b`) Original commit changeset: a00d0364092d Original Phabricator Diff: D32606547 (`e61fc1c03b`) fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56	2021-12-11 22:51:03 -08:00
Jerry Zhang	f575179953	[quant][fx][graphmode] Move more patterns to use ModuleReLU fuse handler (#69644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69644 This PR cleans up the init of ModuleReLUFuseHandler and moved all `module - relu` fusion pattern to use this handler also disabled additional_fuser_method argument temporarily, will enable after we bring back the simple pattern format Test Plan: ``` python test/test_quantize_fx.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32974906 fbshipit-source-id: 23483ea4293d569cb3cec6dadfefd4d9f30921a7	2021-12-11 22:00:06 -08:00
Tristan Rice	e61fc1c03b	torch/monitor: add C++ events and handlers (#68783 ) Summary: This adds a C++ event handler corresponding to the Python one mentioned in the RFC. This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32606547 fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead	2021-12-11 16:44:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	20f7c893c1	Populate runtime with upgrader graph (#68773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68773 Test Plan: Imported from OSS Reviewed By: qihqi, gmagogsfm Differential Revision: D32603258 Pulled By: tugsbayasgalan fbshipit-source-id: 6fa0b7ee4ebe46c9aa148923c6ef3e1de106ad13	2021-12-11 13:44:24 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Stephen Macke	3906f8247a	clear predict_net field from PredictorExporterMeta stored in the exporter to save memory (#68485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68485 In OSS, the only change is that we make the predict_net field of PredictorExporterMeta nullable. Test Plan: sandcastle, let CI run Reviewed By: boryiingsu Differential Revision: D32467138 fbshipit-source-id: 81bd5fca695462f6a186bcfa927073874cc9c26a	2021-12-10 21:25:36 -08:00
Scott Wolchok	19fecc63e4	[PyTorch][kineto] Remove heap-allocated vectors in saveExtraArgs (#69737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69737 We can use stack allocation instead. ghstack-source-id: 145312454 Test Plan: Ran internal framework overhead benchmark with --stressTestKinto --kinetoAddFlops, but difference was minimal. Still good to fix. Reviewed By: chowarfb Differential Revision: D33007329 fbshipit-source-id: e096312fef5b729cf12580be152c9418683745b8	2021-12-10 20:24:17 -08:00
Xu Zhao	731c8255b7	Fix the TorchBench CI when running with a benchmark branch. (#69795 ) Summary: Fixes TorchBench CI when user is running with their own branch Supersedes https://github.com/pytorch/pytorch/pull/69770 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69795 Reviewed By: malfet Differential Revision: D33032886 Pulled By: xuzhao9 fbshipit-source-id: 82baee94df6925bf91bb575143efa058ce98b914	2021-12-10 18:04:43 -08:00
Nikita Shulga	59deee8308	Make c10 tests compilable with -Werror (#69711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69711 Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997005 Pulled By: malfet fbshipit-source-id: 369194051ece9d213b48584ca84e5d76b3794dae	2021-12-10 16:47:46 -08:00
Nikita Shulga	e305e4d4d8	Suppress common warnings when building by clang (#69710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710 Namely no range-loop-analysis (that detect when loop variable can not be const reference Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997003 Pulled By: malfet fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918	2021-12-10 16:45:38 -08:00
Nikita Shulga	41c344d460	Revert D32739976: fix CompositeImplicitAutograd ops improperly labeled Test Plan: revert-hammer Differential Revision: D32739976 (`195b0d0645`) Original commit changeset: a756dd9e0b87 Original Phabricator Diff: D32739976 (`195b0d0645`) fbshipit-source-id: 6e898dd5435f31e604588e6e50be1217fa207a54	2021-12-10 13:04:29 -08:00
Nikita Shulga	77213fa4d3	Fix docker builds for Python-3.6 (#69785 ) Summary: As [conda-4.11](https://anaconda.org/anaconda/conda/files?version=4.11.0) is no longer available for Python-3.6, stick to 4.10 for 3.6 builds Fixes https://github.com/pytorch/pytorch/issues/69781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69785 Reviewed By: seemethere, atalman Differential Revision: D33026217 Pulled By: malfet fbshipit-source-id: d742a1e79634ed62b3a941ba23a7a74f41c2f4cb	2021-12-10 12:29:15 -08:00
Kevin Tse	a5a7e30943	[DataPipe] Adding interface for MapDataPipes (#69648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69648 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32989066 Pulled By: NivekT fbshipit-source-id: ef96bcd4ac4d7a576fdd2a3fb4ef52ae6a902e10	2021-12-10 12:06:08 -08:00
Kevin Tse	81a60b9813	[DataPipe] Adding output types to DataPipe interface file (#69647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69647 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32989067 Pulled By: NivekT fbshipit-source-id: 2c2e71e9e514e0d584affaa0b71b7b0d07a2ddbf	2021-12-10 12:04:45 -08:00
Scott Wolchok	d026057bb3	[PyTorch] Update SmallVector from LLVM (#69110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110 I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing. The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412. ghstack-source-id: 145327773 Test Plan: CI Reviewed By: bhosmer, mruberry Differential Revision: D32733017 fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d	2021-12-10 11:57:19 -08:00
Scott Wolchok	1d269e8c15	[PyTorch] Simple refcount bump fixes in standardizeVectorForUnion & callees (#66695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66695 More extra reference counting in this path. ghstack-source-id: 145125484 Test Plan: CI Reviewed By: suo Differential Revision: D31692197 fbshipit-source-id: 126b6c72efbef9410d4c2e61179b6b67459afc23	2021-12-10 11:43:01 -08:00
Wanchao Liang	5374d5d8c9	[shard] fix with_comms wrapper (#69493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69493 When added `with_comms` decorator with arguments, we added an `with_comms_decorator` inner function, `with_comms()` will refer to a function object, the added parentheses was necessary to use in test cases. This PR fixes the `with_comms` wrapper behavior, to allow we both specify with/without arguments in test cases: ``` with_comms def test_case: ... ``` or ``` with_comms(backend="gloo") def test_case: ... ``` ghstack-source-id: 145327066 Test Plan: test_sharded_tensor Reviewed By: pritamdamania87 Differential Revision: D32897555 fbshipit-source-id: 2f3504630df4f6ad1ea73b8084fb781f21604110	2021-12-10 10:25:54 -08:00
David Berard	e1c583a691	[JIT] simplify logic for merging types during profiling (#69096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69096 Instead of storing profiling data in a map and then merginging at the end, perform merging directly during profiling. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32772626 Pulled By: davidberard98 fbshipit-source-id: 22622c916a61908b478dd09433815685ce43682a	2021-12-10 09:29:19 -08:00
Nikita Shulga	3219f6a487	Make vec512 bfloat16 map function clang-Wall clean (#69707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69707 `const` modifier for `__m512` return value doesn't make much sense Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997008 Pulled By: malfet fbshipit-source-id: fb98659713fe2a23cc702252c0655106687f0dbf	2021-12-10 09:11:42 -08:00
Nikita Shulga	a5ad2cdab5	Cleanup ProcessGroup.cpp (#69706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69706 Mostly code modernization, also do not capture unused `this` in end_handler functor Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32997009 Pulled By: malfet fbshipit-source-id: ac907f0c6889ad06d4fb0171964cb05133e5e610	2021-12-10 09:11:39 -08:00
Nikita Shulga	7ea5926130	Make blend operations clang-Wall clean (#69705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69705 Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997007 Pulled By: malfet fbshipit-source-id: cbadc44e1e7373800e94b7b2fd2711530854978c	2021-12-10 09:10:07 -08:00
Brian Hirsh	195b0d0645	fix CompositeImplicitAutograd ops improperly labeled (#69169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69169 I checked `derivatives.yaml`, and it doesn't look like `logical_not/and/xor` are meant to work with autograd. Those 3 ops are currently set as `CompositeImplicitAutograd` though, implying that they do work with autograd. Updating them to be CompositeExplicitAutograd instead. This came up because I'm trying to improve the error checking in external backend codegen, and these ops being improperly labeled incorrectly triggers my new error checks for XLA (see https://github.com/pytorch/pytorch/pull/67090) Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32739976 Pulled By: bdhirsh fbshipit-source-id: a756dd9e0b87276368063c8f4934be59dca371d3	2021-12-10 09:03:51 -08:00
Richard Barnes	29d759948e	use irange for loops 2 (#66746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705361 fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268	2021-12-10 04:26:23 -08:00
Hao Lu	91d16cb633	[Jit] Fix schema of aten::split int[] version (#69745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745 Missed in D31935573 (`6b44e75f6b`). Reviewed By: d1jang Differential Revision: D31889867 fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756	2021-12-10 02:33:36 -08:00
Peter Bell	9962bfb3c9	Remove THTensor (#69040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69040 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872478 Pulled By: ngimel fbshipit-source-id: f93e16509d64308d91e374744410a6a811e7f4e3	2021-12-10 02:29:11 -08:00
Hui Guo	531b045446	[tensorexpr] Fix the buf size of discontiguous tensors (#69657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69657 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32974473 Pulled By: huiguoo fbshipit-source-id: 52dcd13d0ad7f7e4f1beb69dcaabc8ceb386ffca	2021-12-10 01:26:37 -08:00
Rui Zhu	aab67c6dff	Add native masked_softmax (#69268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69268 This diff enabled native masked softmax on CUDA, also expanded our current warp_softmax to accept masking. The mask in this masked softmax has to be the same shape as input, and has to be contiguous. In a following diff I will submit later, I will have encoder mask layout included, where input is BHDD and mask is BD. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32338419 fbshipit-source-id: 48c3fde793ad4535725d9dae712db42e2bdb8a49	2021-12-09 23:29:45 -08:00
Hao Lu	a5996a6857	[SR] Wrap check_for_memory_leak with DCHECK (#69588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69588 Code cleanup Reviewed By: mikeiovine Differential Revision: D32938333 fbshipit-source-id: d15dc405b281411c4c3c27a1dabf82f430c3ed08	2021-12-09 22:11:21 -08:00
Nikita Shulga	3bb20ae49f	Make c10d tests -Werror clean (#69703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997001 Pulled By: malfet fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2	2021-12-09 22:10:04 -08:00
Nikita Shulga	be757addfa	Do not use `std::labs` (#69704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69704 Instead, compute size diff inside the if statement Test Plan: Imported from OSS Reviewed By: zou3519, seemethere Differential Revision: D32997004 Pulled By: malfet fbshipit-source-id: a23819240bfe8278a11ebc6bae1e856de162f082	2021-12-09 22:05:14 -08:00
BowenBao	3f02ad09ec	[ONNX] shapeValueMap: Represent symbolic shape as value (#68203 ) (#69545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69545 Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32994272 Pulled By: malfet fbshipit-source-id: 77cbdd78d01712faf4f9703549a2833340954509 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-12-09 22:00:46 -08:00
Ha-nyung Chung	3d32a0c139	Back out "[wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op" (#69713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69713 Original commit changeset: 456086b308c4 Original Phabricator Diff: D32537714 (`bd8a4a9372`) Reviewed By: jerryzh168 Differential Revision: D32976643 fbshipit-source-id: bea6bf6a2718e42c9efa48a0b0c1dc7fe3893065	2021-12-09 21:55:09 -08:00
Ivan Kobzarev	7dba88dfdb	[nnc][quant] Fix quantized concat (#69596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32941108 Pulled By: IvanKobzarev fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2	2021-12-09 18:55:32 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00
Mike Iovine	f87f1d08e8	[SR] assignStorageToManagedTensors returns a vector (#69568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69568 Non-empty vectors should never be passed to `assignStorageToManagedTensors` and `assignStorageToManagedOutputTensors`. Presumably, this out-variant convention was adopted to avoid move-assigning the corresponding attribtues in `MemoryPlanner`. But the cost of a vector move-assign is not high, and this function type signature is safer. Test Plan: `buck test caffe2/bechmarks/static_runtime:static_runtime_cpptest` Reviewed By: donaldong Differential Revision: D32729289 fbshipit-source-id: 88f19de8eb89d8a4f1dd8bbd4d9e7f686e41888b	2021-12-09 17:01:48 -08:00
Don Jang	9aa1b3e396	[Static Runtime] [Code Cleanup] Encapsulate function objects within ProcessedFunction (#69595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69595 This changes encapsulates `function` object in `ProcessedFunction` objects instead of exposing it unnecessarily just for executing it. Test Plan: Existing tests Reviewed By: mikeiovine Differential Revision: D32908341 fbshipit-source-id: 5ff4951cbe276c5c6292227124d9eec1dd16e364	2021-12-09 15:11:03 -08:00
Richard Zou	41e1ab0785	Introduce isTensorSubclassLike; add special cases to backwards formulas (#69534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534 Something is TensorSubclassLike if it is a Tensor subclass or if it has the same problems as Tensor subclasses. Today that just includes Tensor Subclasses and meta tensors but may include other things in the future. Some of our backwards formulas are incompatible with TensorSubclassLike objects. For example, calling .data_ptr() is a problem because many TensorSubclassLike objects don't have storage. Another problem is in-place operations: performing `regular_tensor.inplace_(tensor_subclass)` is a problem. This PR adds special cases to the backward formulas for torch.max and torch.clamp to handle this. The backward formulas for torch.max and torch.clamp are not dispatcher operations so they cannot be overridden and we hesitate to make them dispatcher operations for FC/BC concerns and performance overhead concerns. Furthermore, the old concept of "is this inplace operation vmap compatible?" can be subsumed by the general "is this inplace operation tensor-subclass compatible" question, so I replaced all instances of isInplaceVmapCompatible and replaced it with the isTensorSubclassLike checks. Test Plan - I tested the changes using functorch. - It's possible to write a test for these in core (one has to make a custom tensor subclass and then send it through the operation and then invoke autograd), but I wanted to push the work to doing some generic testing for backward formulas (https://github.com/pytorch/pytorch/issues/69530) instead of doing some one-off things now. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32967727 Pulled By: zou3519 fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc	2021-12-09 15:03:22 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Eli Uriegas	193e3c484e	.github: Add fbsync to push triggers (#69718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69718 canary is now pushing to fbsync so we should change our workflows to reflect that. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D32999967 Pulled By: seemethere fbshipit-source-id: bc4bc9afd2d73c53f91d3af3b81aca1b31f665a4	2021-12-09 14:30:29 -08:00
Mike Iovine	3e20a74b55	[SR] Update memory planner docs (#69559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69559 We have a lot of special cases. Document them so they're easy to learn about. ghstack-source-id: 145226542 Test Plan: Spell check? :) Reviewed By: d1jang Differential Revision: D32929416 fbshipit-source-id: 2362410f25a27cdb74a4939903446192cef61978	2021-12-09 14:22:33 -08:00
Juhyeong Kim	e963b43691	Extend explanation of `torch.cholesky_inverse` to consider batched inputs. (#69069 ) Summary: While implementing https://github.com/pytorch/pytorch/issues/68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](https://github.com/pytorch/pytorch/pull/68720#pullrequestreview-817243697) `torch.cholesky_inverse` is implemented in https://github.com/pytorch/pytorch/issues/50269 and the doc was updated at https://github.com/pytorch/pytorch/issues/31275 but not merged. neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/69069 Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44	2021-12-09 14:01:31 -08:00
chunyuan	9ad05f2c3a	Upgrade oneDNN to v2.3.3 and package oneDNN Graph API together (#63748 ) Summary: This PR upgrades oneDNN to [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) and includes [Graph API preview release](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.2) in one package. - oneDNN will be located at `pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN` - The version of oneDNN will be [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) The main changes on CPU: - v2.3 - Extended primitive cache to improve primitive descriptor creation performance. - Improved primitive cache performance in multithreaded configurations. - Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). - Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats. - Improved performance of reduction primitive - Improved performance of depthwise convolution primitive with NHWC activations for training cases - v2.3.1 - Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support - Fixed integer overflow for inner product implementation on CPUs - Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 - v2.3.2 - Fixed performance regression in fp32 inner product primitive for processors with Intel AVX512 support - v2.3.3 - Reverted check for memory descriptor stride validity for unit dimensions - Fixed memory leak in CPU GEMM implementation More changes can be found in https://github.com/oneapi-src/oneDNN/releases. - The Graph API provides flexible API for aggressive fusion, and the preview2 supports fusion for FP32 inference. See the [Graph API release branch](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2) and [spec](https://spec.oneapi.io/onednn-graph/latest/introduction.html) for more details. A separate PR will be submitted to integrate the oneDNN Graph API to Torchscript graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63748 Reviewed By: albanD Differential Revision: D32153889 Pulled By: malfet fbshipit-source-id: 536071168ffe312d452f75d54f34c336ca3778c1	2021-12-09 13:42:40 -08:00
Michael Suo	17641fed2a	Revert D32942007: OpInfo: Convert more sample_input_funcs to generators Test Plan: revert-hammer Differential Revision: D32942007 (`d21646c432`) Original commit changeset: bb5b253d6d87 Original Phabricator Diff: D32942007 (`d21646c432`) fbshipit-source-id: d37c78174f0acea48e4cd4af3ac67ca4ee7ac54d	2021-12-09 10:54:41 -08:00
milesial	0ccb1dcdbb	Fix inference_mode decorator (#68617 ) Summary: This fixes the case when `torch.inference_mode` is called with `mode=False` (disabled). When used as a decorator, it ignored the argument and enabled inference mode anyway. `_DecoratorContextManager` is changed so that a new instance is a copy instead of a new instance with default parameters. I also added more tests to cover this case. Current behaviour: ```python >>> import torch >>> x = torch.ones(1, 2, 3, requires_grad=True) >>> torch.inference_mode(mode=False) ... def func(x): ... return x * x ... >>> out = func(x) >>> out.requires_grad False ``` New behaviour (fixed): ```python >>> import torch >>> x = torch.ones(1, 2, 3, requires_grad=True) >>> torch.inference_mode(mode=False) ... def func(x): ... return x * x ... >>> out = func(x) >>> out.requires_grad True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68617 Reviewed By: mrshenli Differential Revision: D32958434 Pulled By: albanD fbshipit-source-id: 133c69970ef8bffb9fc9ab5142dedcffc4c32945	2021-12-09 10:45:09 -08:00
Richard Barnes	afb742382a	use irange for loops 10 (#69394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837991 fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32	2021-12-09 09:49:34 -08:00
Nik B	2d5b3101c1	Added ScriptFunction pkl exception for issue #61210 #61381 (#67076 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61381, https://github.com/pytorch/pytorch/issues/61210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67076 Reviewed By: jbschlosser Differential Revision: D32908175 Pulled By: suo fbshipit-source-id: f6e175793243dc96cde5e44022d92f2623b934eb Co-authored-by: LucaStubbe <stubbeluca@gmail.com> Co-authored-by: Kanon Tromp <ktromp1@student.cccd.edu>	2021-12-09 09:44:49 -08:00
Peter Bell	d21646c432	OpInfo: Convert more sample_input_funcs to generators (#69257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69257 These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself. Diff is best viewed ignoring whitespace changes https://github.com/pytorch/pytorch/pull/69257/files?diff=unified&w=1 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942007 Pulled By: mruberry fbshipit-source-id: bb5b253d6d87b3495b7059924bed35b09d2768a2	2021-12-09 08:38:51 -08:00
Peter Bell	6de9f0fc94	OpInfo: Allow sample_inputs_func to be any iterable (#69256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256 Closes #52486 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942008 Pulled By: mruberry fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206	2021-12-09 08:37:26 -08:00
Gao, Xiang	d2917f705a	Fix errors in `common_utils.py` (#69578 ) Summary: This fixes the following error: ```python Traceback (most recent call last): File "/home/gaoxiang/pytorch-ucc2/test/distributed/test_distributed_spawn.py", line 40, in <module> run_tests() File "/home/gaoxiang/.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 618, in run_tests ['--import-slow-tests'] if IMPORT_SLOW_TESTS else List[str]([])) File "/usr/lib/python3.9/typing.py", line 680, in __call__ raise TypeError(f"Type {self._name} cannot be instantiated; " TypeError: Type List cannot be instantiated; use list() instead Traceback (most recent call last): File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1058, in <module> main() File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1036, in main raise RuntimeError(err_message) RuntimeError: distributed/test_distributed_spawn failed! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69578 Reviewed By: mrshenli Differential Revision: D32963113 Pulled By: malfet fbshipit-source-id: b064e230c5e572e890b4ac66ebdda2707b8c12d7	2021-12-09 07:33:43 -08:00
Zafar	07932e2735	[sparsity] Convert function for sparse kernels without a context manager (#66778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66778 This removes the hack of the context manager that would communicate the zeros block shape to the quantization convert. The conversion will assume that the converted modules have `sparse_params` (which is added by the sparsifier). Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31835721 Pulled By: z-a-f fbshipit-source-id: c5fd2da3b09a728a2296765c00ca69275dbca3b1	2021-12-09 02:58:57 -08:00
Nicolas Hug	b957b82db7	Replace issue templates with new issue forms - v2 (#69361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69361 This PR introduces the new issue forms that replace issue templates. (This is exactly the same as https://github.com/pytorch/pytorch/pull/65917 which was reverted due to an issue during the import) This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml)) The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions. There is still a "blank" template available. I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this covers all previous use-cases properly. Test Plan: Imported from OSS Reviewed By: albanD, mrshenli Differential Revision: D32947189 Pulled By: NicolasHug fbshipit-source-id: f19abe3e7c9c479b0b227969a207916db5bdb6e3	2021-12-09 02:42:29 -08:00
Zafar	e948856ce7	[sparsity] Add ability to keep sparsity parameters in modules (#66777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66777 Sometimes one might need to keep the sparsity parameters after the sparsifier is detached. This saves the parameters in the `sparse_params`. There are two ways of keeping the sparsifier params: 1. Tuple[str, ...]: A tuple of all the parameters that need to be stored. 2. Dict[str, Tuple[str, ...]]: A dict of layer keys and parameters. In this case only specified layers will have the parameters attached to. For example: ``` >>> # This will keep params in every module >>> sparsifier.squash_mask(keep_sparse_params=('sparse_block_shape',)) >>> print(model.submodule.linear1.sparse_params) {'sparse_block_shape': (1, 4)} >>> print(model.submodule.linear2.sparse_params) {'sparse_block_shape': (1, 4)} ``` ``` >>> # This will keep params only in specific modules >>> sparsifier.squash_mask(keep_sparse_params={'submodule.linear1': ('sparse_block_shape',)}) >>> print(model.submodule.linear1.sparse_params) {'sparse_block_shape': (1, 4)} >>> print(model.submodule.linear2.sparse_params) AttributeError: 'Linear' object has no attribute 'sparse_params' ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31835722 Pulled By: z-a-f fbshipit-source-id: 20c2d80207eb7ce7291e7f5f655d3fb2a627190f	2021-12-09 02:36:27 -08:00
Chen Lai	13faaff54c	[Operator Versioning][Edge] Implement register function for upgrader (#67730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730 This pr implement the register function for upgrader so it can be used at loading stage ghstack-source-id: 145170986 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092518 fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36	2021-12-09 02:18:09 -08:00
Zafar	4f5806dee7	[AO] Clear the contents of the torch/ao/__init__.py (#69415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69415 Adding the imports inside the torch/ao/__init__.py has a high chance of causing circular dependencies, especially if sparsity and quantization use each other's resources. To avoid the dependency issues, we can just keep the __init__ empty. Notes: - This means that the user will have to explicitly import the `torch.ao.quantization` or `torch.ao.sparsity` instead of `from torch import ao; ao.quantization.???`. - The issue of circular dependencies that are caused by the imports with binding submodules is [fixed in Python 3.7](https://docs.python.org/3/whatsnew/3.7.html#other-language-changes), which means this solution will become obsolete at the [3.6's EoL](https://www.python.org/dev/peps/pep-0494/#and-beyond-schedule), which comes [12/23/2022](https://devguide.python.org/#status-of-python-branches). Future options to resolve the circular dependencies (subject to discussion): 1. Use interfaces for binding submodules. For example, have a torch/ao/_nn with all the source code, and an interface torch/ao/nn with only the __init__.py file. The __init__ files inside the torch/ao/_nn will be empty 2. Completely isolate the common code into a separate submodule, s.a. torch/ao/common. The other submodules will not be referencing each other. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32860168 Pulled By: z-a-f fbshipit-source-id: e3fe77e285992d34c87d8742e1a5e449ce417c36	2021-12-09 01:21:30 -08:00
CodemodService FBSourceClangFormatLinterBot	015e481a41	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32975574 fbshipit-source-id: 66856595c7bc29921f24a2c5c00c72892f262aa1	2021-12-09 00:10:33 -08:00
Mike Ruberry	dc87cf5fe1	Fixes mem_get_info when querying on a device other than the current device (#69640 ) Summary: Also fixes the documentation failing to appear and adds a test to validate that op works with multiple devices properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69640 Reviewed By: ngimel Differential Revision: D32965391 Pulled By: mruberry fbshipit-source-id: 4fe502809b353464da8edf62d92ca9863804f08e	2021-12-08 23:04:30 -08:00
Sangbaek Park	24d885f5f8	[Vulkan] Thread-safe Vulkan backend for OSS (#69576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69576 Vulkan backend for OSS is also thread-safe by default: * Removed `MAKE_VULKAN_THREADSAFE` preprocessor and if-conditions Test Plan: Test build on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5: ``` //xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s) Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 10.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 27.1 ms 5.86 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 58.5 ms 11.8 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.803 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 9.14 ms 0.857 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 32.1 ms 31.3 ms 3000 ``` Test result on MacOS: ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 18.89, 29.61, 24.95 *WARNING* Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 53.3 ms 39.6 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 28.0 ms 20.7 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 51.8 ms 38.7 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.76 ms 1.31 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.29 ms 1.11 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 49.2 ms 41.8 ms 3000 ``` Reviewed By: SS-JIA Differential Revision: D32933891 fbshipit-source-id: d8ebd5394771e1d79230c1f3aa8fbec4472b3197	2021-12-08 21:04:52 -08:00
Natalia Gimelshein	ecf9c82f24	Reduce binary size of TensorCompare.cu (#68835 ) Summary: This PR does several things 1) eliminates `where` instantiations for deprecated `byte` condition dtype, and casts `condition` to `bool` in this case. This is a perf penalty for people using deprecated calls 2) Makes `clamp_{min/max}.Tensor` overload reuse `clamp_{min/max}.Scalar` kernels if limit argument is cpu scalar, instead of instantiating `gpu_kernel_with_scalars` 3) Unifies all clamp_scalar kernels to use a single kernel with lambda picking the correct operation. I've verified that it doesn't degrade kernel performance. 4) Eliminates redundant TensorIterator construction that `clamp` structured kernel was doing when only `min` or `max` was specified This reduces the cubin size for TensorCompare.cu on V100 from 15751920 bytes to 7691120 bytes, with corresponding reduction in compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68835 Reviewed By: mruberry Differential Revision: D32839241 Pulled By: ngimel fbshipit-source-id: 0acde5af10a767264afbdb24684b137c5544b8d9	2021-12-08 20:08:53 -08:00
Sangbaek Park	3e560239e2	[Vulkan] Implement clone operator (#69551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69551 Implemented `clone` operator in the Vulkan backend: * Supports only <= 4D tensors. * Internal name is `aten::clone`. * Vulkan `clone` operator accepts only `c10::MemoryFormat::Preserve` and `c10::MemoryFormat::Contiguous` for the argument `c10::optional<c10::MemoryFormat> optional_memory_format`. * Throws an exception if the `optional_memory_format argument` is neither `MemoryFormat::Preserve` nor `MemoryFormat::Contiguous` * CPU implementation: [/aten/src/ATen/native/TensorFactories.cpp::clone()](`3e45739543/aten/src/ATen/native/TensorFactories.cpp (L1415)`) * MKL-DNN implementation: [/aten/src/ATen/native/mkldnn/TensorShape.cpp::mkldnn_clone()](`3e45739543/aten/src/ATen/native/mkldnn/TensorShape.cpp (L58)`) * `self.copy_(src)` calls `copy_()` for Vulkan to Vulkan copy operation ``` vTensor::copy_() vTensor::copy_() X -> Vulkan vTensor::copy_() CPU -> Vulkan vTensor::clone() vTensor::clone() -> MemoryFormat::Preserve vTensor::clone() -> MemoryFormat::Preserve -> self = at::empty_like(src) vTensor::clone() self.copy_(src); -> BEFORE vTensor::copy_() vTensor::copy_() X -> Vulkan vTensor::copy_() Vulkan -> Vulkan vTensor::clone() self.copy_(src); -> AFTER vTensor::copy_() vTensor::copy_() Vulkan -> X vTensor::copy_() Vulkan -> CPU ``` * References: * Function `torch.clone` in PyTorch documentation: https://pytorch.org/docs/stable/generated/torch.clone.html * Pytorch preferred way to copy a tensor: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor * `torch.memory_format`: https://pytorch.org/docs/stable/tensor_attributes.html?highlight=memory_format#torch.torch.memory_format * `c10::MemoryFormat` definition in [/c10/core/MemoryFormat.h](`3e45739543/c10/core/MemoryFormat.h (L28)`) Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.clone_success [ OK ] VulkanAPITest.clone_success (5 ms) [ RUN ] VulkanAPITest.clone_invalidinputs_exceptions [ OK ] VulkanAPITest.clone_invalidinputs_exceptions (1 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.clone_success [ OK ] VulkanAPITest.clone_success (19 ms) [ RUN ] VulkanAPITest.clone_invalidinputs_exceptions [ OK ] VulkanAPITest.clone_invalidinputs_exceptions (2 ms) ``` Reviewed By: SS-JIA Differential Revision: D32923535 fbshipit-source-id: ea29792e1b0080cbbc1c8c7e8bf2beffad9b5c0d	2021-12-08 18:46:56 -08:00
Pritam Damania	eb2a803406	Run test_embedding_bag_with_no_grad_tensors only for TensorPipe (#69626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69626 Sparse tensors are only supported by the TensorPipe RPC backend. As a result, moving test_embedding_bag_with_no_grad_tensors to be a TensorPipe specific test. ghstack-source-id: 145134888 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D32959952 fbshipit-source-id: d65f2edbb6dad7705475690a8c6293a322299dde	2021-12-08 18:29:38 -08:00
soulitzer	b61c532f96	Make make_dual redispatch (#68630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68630 Constraints: 1) (functorch) if all the inputs to an op have requires_grad=False and don't have tangents, then their VariableType kernel should be a no-op i.e., behave like a redispatch. This is due to functorch's DynamicLayerStack having the autograd key by default (which is so that transformations like vmap) still work with autograd 2) (inference mode) inference tensors in inference mode will call straight into the kernel, we should still do something sensible inside even if we normally wouldn't redispatch into it. 3) ~Should support potential application of interposition below autograd: `nn.Parameter` is a example of subclassing where the subclass is not preserved when an operation is performed. There is an exception though: we want calling `make_dual` on a `nn.Parameter` to preserve its parameterness.~ 4) Should avoid calls to shallow_copy_and_detach to avoid spurious calls into `__python_dispatch__`. This PR: - does not redispatch to `make_dual` from its `ADInplaceOrView` kernel to satisfy (1) - calls into `alias` from the kernel in the native namespace so that behavior is consistent with other views in inference mode to satisfy (2) - discussion of (3). We still wouldn't be able to directly override `make_dual` below autograd. In this PR, instead of not redispatching at all, we choose to redispatch into `at::alias` so that one can override `make_dual`. The side effect is that one would not be able to distinguish calls between the two, which can be problematic (though a straightforward but hacky solution would be to create a new `at::alias_for_make_dual` that would allow users to distinguish) the two. This isn't ideal but seems to be the simplest way to satisfy (3). We don't pursue that hacky solution here. - (4) is satisfied because we remove calls to `shallow_copy_and_detach` <details> <summary> A potentially less hacky but more involved solution? (WIP) </summary> Realizing that make_dual is more like requires_grad, perhaps it shouldn't be autograd explicit? Make make_dual a composite or python-only construct. i.e., it would be a view on the primal followed by something to the effect of primal.set_fw_grad(tangent). Additional constraints: 5) make_dual needs to be backward-differentiable (I can't think of any applications yet becuase technically as a high-order function, jvp's input is the tangent only, "detach" is not applied on the tangent, so one would still be able to propagate gradients through it). 6) set_fw_grad needs to raise an error if there is a layout mismatch and base is a forward-differnentiable view Possible plan - (6) implies that a plain view would not suffice. We need a `detach`-like operation to ensure that set_fw_grad knows the view is not forward differentiable. - (5) implies that is this (new) `detach` would need to be backward differentiable (API TBD). - (3) is no longer relevant because make_dual is no longer autograd explicit, but perhaps this new detach should behave like the current one? There is a lot of logic to replicate for detach, so this may be hard. - (1) is satisfied if we use current detach logic, i.e., , and (4) is trivial. I'm not convinced that this is the right solution either, because in the end does (3) still work? </details> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32899679 Pulled By: soulitzer fbshipit-source-id: 98e13ae954e14e1e68dbd03eb5ab3300d5ed2c5e	2021-12-08 17:56:03 -08:00
soulitzer	7956a405ef	Make make_dual also return namedtuple when level less than zero (#68628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68628 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32899681 Pulled By: soulitzer fbshipit-source-id: 61ed09f4038e19817978a521e9571fdc482b424b	2021-12-08 17:54:40 -08:00
Mike Iovine	1c43b1602c	[SR] Scope exit guard for memory planner deallocation (#68795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68795 This change improves static runtime exception safety. Added a scope exit guard that invokes `MemoryPlanner::deallocate` in its destructor. Caveat: we have to be really careful with the exception behavior of `MemoryPlanner::deallocate` and `MemoryPlanner`'s constructor, because they're now both potentially called in the destructor of the scope exit guard. Letting exceptions potentially escape destructors is playing with fire since 1) the destructor of `Deallocator` is (implicitly) `noexcept`, 2) even if it wasn't, `std::terminate` will be called if an exception escapes and the stack is already unwinding. To get around this, we wrap the deallocation stuff in a try/catch. If deallocation throws, then we simply reset all of the memory planner stuff and carry on. There's a catch: the code path that we take when handling the deallocation exception can't throw. However, this code path is much simpler than memory planner construction/deallocation, so it's much easier to manually audit the correctness here. Test Plan: New unit tests `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32609915 fbshipit-source-id: 71fbe6994fd573ca6b7dd859b2e6fbd7eeabcd9e	2021-12-08 16:41:52 -08:00
Nikita Karetnikov (ニキータカレートニコフ)	3b27304d20	Fix typos in ATen README (#69170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69170 Reviewed By: mrshenli Differential Revision: D32957504 Pulled By: H-Huang fbshipit-source-id: d8e613b67a864f95e45b2d45398ee71efde0c567	2021-12-08 14:02:26 -08:00
Jiewen Tan	b10381f42d	Port smooth_l1_loss to structured kernels (#67404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67404 Port smooth_l1_loss to structured kernels. Brian Hirsh authored the part of adding build_borrowing_binary_op_coerce_to_scalar to TensorIterator. Test Plan: This commit shouldn't change the behavior. So, CI. Reviewed By: bdhirsh, ngimel Differential Revision: D31981147 Pulled By: alanwaketan fbshipit-source-id: a779bb76c848eed8b725dc0e1d56b97a3bd9c158	2021-12-08 12:56:24 -08:00
Charles David Hernandez	497ec9d9b8	Getting NS to work with Ferraris (#68908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68908 see description in github Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32928449 fbshipit-source-id: ba7085b823a0ebcd0d9e40f4ac19ca0a2cac1169	2021-12-08 12:26:00 -08:00
Bryan Reese	51b6981c36	[PyTorch Tests] Split out skip logic, make changes for plugins (#67256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67256 To change what tests can be run in various cases, the check logic should be moved to functions and variables that can be changed. One challenge here is that decorators don't have dynamic functionality. If something is read in when imported and then changed afterwards, it will not actually change. This means we need to separate out the variables that need to be changed for our use case. Those are put into common_distributed.py and can be changed before importing the distributed_test.py code. The use case is to add new backends to the tests and split it into tests that can be ran on demand as a separate instance. To do so, you would change DistTestSkipCases after importing it into a launcher or a setup script and then load distributed_test. Test Plan: Check the signals Reviewed By: mrshenli Differential Revision: D31906947 fbshipit-source-id: 45e3258c55f4dc34e12a468bed65280f4c25748f	2021-12-08 12:23:15 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
kshitij12345	7407e3d6fd	[fix] cross_entropy : fix weight with ignore_index and label_smoothing (#69511 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69339 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69511 Reviewed By: mrshenli Differential Revision: D32951935 Pulled By: jbschlosser fbshipit-source-id: 482eae851861a32f96bd6231dd3448fb6d44a015	2021-12-08 12:08:33 -08:00
Rohan Varma	d44d59aa70	[BE] Enable C++ stacktraces for MultiProcessTestCase (#69175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69175 Shows C++ stacktraces for python distributed tests that inherit from MultiProcessTestCase. Closes https://github.com/pytorch/pytorch/issues/69168 ghstack-source-id: 145085858 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32736872 fbshipit-source-id: 743e870eefa7a9e77c5791d0936e2ebd5c9b1016	2021-12-08 11:57:51 -08:00
John Clow	adb619a193	Adding hardswish, opinfo tests to custom rules (#69399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69399 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32937576 Pulled By: Gamrix fbshipit-source-id: 0e53d9e6669e70abcc744399f022a902214ef213	2021-12-08 11:56:34 -08:00
Chen Lai	a0efa48c7b	[Operator Versioning][Edge] Have operator version number available at the loading stage (#67729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67729 1. operator version is needed to decide whether applying upgrader or not. This pr make it available at loading stage. 2. Swap the order of parsing instruction and operator, because instruction needs to know the operator first because deciding whether applying upgrader or not (change `OP` to `CALL` or not). ghstack-source-id: 145082390 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092516 fbshipit-source-id: 853a68effaf95dca86ae46b7f7f4ee0d8e8767da	2021-12-08 11:50:46 -08:00
anjali411	2808563e69	Forward fix for failing master (#69625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69625 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32959635 Pulled By: anjali411 fbshipit-source-id: 4d811c6a05deb991cb2886dd65b3f6059555b395	2021-12-08 11:30:38 -08:00
anjali411	3e6164449f	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834987 Pulled By: anjali411 fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd	2021-12-08 10:37:39 -08:00
Vincent-Pierre Berges	30bb4e0071	Add nvidia-smi memory and utilization as native Python API (#69104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69104 Add nvidia-smi memory and utilization as native Python API Test Plan: testing the function returns the appropriate value. Unit tests to come. Reviewed By: malfet Differential Revision: D32711562 fbshipit-source-id: 01e676203299f8fde4f3ed4065f68b497e62a789	2021-12-08 10:33:23 -08:00
Will Constable	ee60b5ddf3	Improve efficiency of shape hash by not using tostring (#69496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69496 tostring is expensive, and this is equivalent and faster Test Plan: covered by lazy tensor unit tests Reviewed By: desertfire, alanwaketan Differential Revision: D32901050 fbshipit-source-id: 34080f415db5fd5d3817f7f2533f062a6ec07d21	2021-12-08 09:16:00 -08:00
Kushashwa Ravi Shrimali	2cb385dd6e	OpInfo for `nn.functional.dropout2d`, revise sample inputs for `dropout` (#67891 ) Summary: Earlier, we were only testing for inputs with the shape of `(5,)` for `nn.functional.dropout`, but since it's used a lot - I feel it's a good idea to test for a few more shapes including scalars. This PR: 1. Revises sample inputs for `nn.functional.dropout` 2. Adds an OpInfo for `nn.functional.dropout2d`. A note regarding the documentation: Looks like `nn.functional.dropout2d` also supports inputs of shape `(H, W)` apart from `(N, C, H, W) / (C, H, W)` but the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html#torch.nn.Dropout2d) doesn't mention that (`H, W` case). Should that be revised or am I missing anything here? (Filed an issue here: https://github.com/pytorch/pytorch/issues/67892) ```python # A 2D tensor is a valid input for Dropout2d In [11]: tensor = torch.randn((3, 4), device='cpu', dtype=torch.float32) In [12]: dropout2d = torch.nn.Dropout2d(p=0.5) In [13]: dropout2d(tensor) Out[13]: tensor([[-0.1026, -0.0000, -0.0000, -0.0000], [-1.5647, 0.0000, -0.0000, -0.5820], [-0.0000, -3.2080, 0.1164, -3.6780]]) ``` Issue Tracker: https://github.com/pytorch/pytorch/issues/54261 cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67891 Reviewed By: mrshenli Differential Revision: D32628527 Pulled By: mruberry fbshipit-source-id: 4c9b89550f1d49526e294378ce107eba9f29cabb	2021-12-08 08:54:16 -08:00
Philip Meier	f54745a6ff	add `OpInfo` for `torch.diagflat` (#65680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65680 cc mruberry Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31730001 Pulled By: mruberry fbshipit-source-id: 487e41da4b043944cc5b26d6081209fb0875f4de	2021-12-08 08:49:45 -08:00
Philip Meier	7e49f4638c	add `OpInfo` for `torch.nn.functional.kl_div` (#65469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65469 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31111698 Pulled By: mruberry fbshipit-source-id: 0af41a2ef2b199db3d8c63050277e72213f04565	2021-12-08 08:48:18 -08:00
Alban Desmaison	8b20dde932	add python dispatch test back to CI and fix typo in test (#69565 ) Summary: The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565 Reviewed By: mrshenli Differential Revision: D32932982 Pulled By: albanD fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422	2021-12-08 08:44:49 -08:00
Don Jang	afaa184b44	[Static Runtime] Avoid evaluating expressions of `Node` for interpreter fallback op (#69489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69489 This change avoids pulling out `Node` out of `ProcessedNode` to evaluate expressions related to `Node` at op execution time. Perf gain is expected to be there but not measurable and the purpose of this change is to make SR's code more self-contained (calling more code from SR not JIT) during execution time. Test Plan: Existing tests Reviewed By: mikeiovine Differential Revision: D32893265 fbshipit-source-id: f0f397666b3556f985d45112af8fe0b08de22139	2021-12-08 08:40:30 -08:00
Charles David Hernandez	fc2614537b	Updating quantization documentation (#68907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68907 Added information about symmetric qschemes and corrected an error in reference to https://github.com/pytorch/pytorch/issues/68540 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32662033 fbshipit-source-id: 9052c597f61991934b86850fea8b6eab78397450	2021-12-08 08:32:33 -08:00
Kevin Tse	39fb855d91	[DataLoader] Implementing communication processes for Map-style DataPipes (#68549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68549 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32922676 Pulled By: NivekT fbshipit-source-id: fd918a342214d617a489ac5acffff15b55e9b255	2021-12-08 07:27:01 -08:00
Ben Koopman	f3983f9c47	[quant][embdding qat] Re-land Add FX support for QAT EmbeddingBag (#69334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69334 Original PR #68121 broke with incompatible qengine for Mac OS, this PR re-introduces changes with fix Add FX support for QAT EmbeddingBag operator, previously only eager mode support. Test Plan: pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embeddingbag_linear" Imported from OSS Reviewed By: jingsh Differential Revision: D32815153 fbshipit-source-id: 33654ce29de6e81920bf3277a75027fe403a1eb2	2021-12-08 05:57:20 -08:00
Ben Koopman	93aa3603ee	[quant][embedding qat] Re-Land Support Embedding QAT via FX API (#69333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333 Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that. Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"` Imported from OSS Reviewed By: jingsh Differential Revision: D32814827 fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16	2021-12-08 05:28:07 -08:00
Peter Bell	fc8404b5bc	histc: Avoid dispatch in parallel region (#68520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68520 Ref #56794 This changes the code from allocating 1 tensor per thread inside the parallel region, to allocating one larger tensor outside the parallel region and manually viewing each thread's slice of the histogram. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32929365 Pulled By: ngimel fbshipit-source-id: e28da2736e849a0282b70f34d11526d3355d5bd5	2021-12-08 02:42:43 -08:00
Pritam Damania	2a38e1a76a	Fix TSAN issue in TCPStore (#69590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69590 The variable `callbackRegisteredData_` was written to without synchronization. ghstack-source-id: 145066862 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D32938979 fbshipit-source-id: bc9a11a70680db45ece95880ae19ce2026e8a88e	2021-12-07 23:29:08 -08:00
Pritam Damania	0ce49000db	Release GIL during RPC shutdown. (#69586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69586 In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. ghstack-source-id: 145062265 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D32937687 fbshipit-source-id: 980adbcc1e3799b40206f7bca6e7695ca67f0fc2	2021-12-07 23:24:57 -08:00
Nikita Vedeneev	c236247826	OpInfo tests for `(svd\|pca)_lowrank` (#69107 ) Summary: As per title. While working on this I have discovered several issues with these methods related to grad instabilities. I will file them and link here later. These were quite painful to force to pass all the tests with these discovered issues, sorry for the delay, mruberry! cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69107 Reviewed By: zou3519 Differential Revision: D32920341 Pulled By: mruberry fbshipit-source-id: 15b33e2b46acdcbff8a37d8e43e381eb55d1a296	2021-12-07 19:50:12 -08:00
Shirong Wu	e06af79136	Fix sign op converter (#69580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69580 Fix bug in sign converter Reviewed By: 842974287 Differential Revision: D32934661 fbshipit-source-id: f21d7c65b07ab2f0a0027939d660e56dacd9cdef	2021-12-07 19:04:51 -08:00
Joel Schlosser	6b950eea27	Remove finput and fgrad_input from slow3d transpose signatures (#68899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68899 Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D32655872 Pulled By: jbschlosser fbshipit-source-id: 963b391a489c639f98d9f634d4f4c668353c799a	2021-12-07 18:24:40 -08:00
Jerry Zhang	05946051f8	[quant][graphmode] initial support for fusion pattern in backend_config_dict (#69335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69335 This PR added support for configuring fusion with: "pattern", "fuser_method" This only works for simple sequence of 2 op patterns currently, will extend this in future PRs Test Plan: regresion test on linear-relu fusion: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32816164 fbshipit-source-id: f300b7b96b36908cb94a50a8a17e0e15032509eb	2021-12-07 16:54:42 -08:00
Richard Barnes	2d38d37f5f	use irange for loops (#69533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69533 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837942 fbshipit-source-id: 8663037a38ade8f81bd5e983a614d197ea11f0d1	2021-12-07 16:53:27 -08:00
Bin Bao	8a975c0106	[LT] Sync with the lazy_tensor_staging branch (#69527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69527 - Add missing TORCH_API in class/struct declarations; - Fix internal op declarations in ltc_ops; - Update lazy_ts_lowering.py Test Plan: Imported from OSS Reviewed By: alanwaketan Differential Revision: D32918929 Pulled By: desertfire fbshipit-source-id: e956d51aff5ef593fdf4cd5ad2a38e38788913d8	2021-12-07 16:47:35 -08:00
Rohan Varma	049debd97d	[Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508 Original Phabricator Diff: D32704467 (`e032dae329`) Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented. Original PR body: Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are: - [x] Gradient hooks are called once - [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn) - [x] works for functions with arbitrary input/output objects - [x] distributed tests (next PR) Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`. ghstack-source-id: 144948501 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32902634 fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe	2021-12-07 16:31:23 -08:00
Sicheng Stephen Jia	3456c2cbc8	Allow build_android.sh to forward Vulkan args (#69332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69332 --- ## Context The `build_android.sh` script currently does not forward Vulkan configuration options, which makes it impossible to control them when running `build_pytorch_android.sh`. ## Changes Slightly change the script to allow Vulkan configuration options to propagate from `build_pytorch_android.sh` to `build_android.sh` Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840908 Pulled By: SS-JIA fbshipit-source-id: e55d89c93c996b92b743cf047f5a285bb516bbc4	2021-12-07 16:24:35 -08:00
Sicheng Stephen Jia	fa39754e11	[vulkan] Disable shader optimization to avoid Validation Errors (#69331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69331 --- ## Context When the optimization flag is turned on, some SPIR-V modules produced from the Vulkan compute shaders were invalid. The Vulkan Validation layer raises the following error for these modules: ``` [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object: VK_NULL_HANDLE (Type = 0) \| SPIR-V module not valid: Header block 52[%52] is contained in the loop construct headed by 44[%44], but it's merge block 47[%47] is not %52 = OpLabel ``` Turning off the optimization flag, the SPIR-V modules produced no longer reports these errors in the Validation layer. ## Changes Turns off optimization when generating SPIR-V modules to ensure correctness of the modules. Note that disabling SPIR-V optimization did not regress inference latency for the several models I tested. Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840910 Pulled By: SS-JIA fbshipit-source-id: 7ccb5691fd0e2d11b9c8c28ad7b83906e8163699	2021-12-07 16:24:32 -08:00
Sicheng Stephen Jia	bede33e3f5	[vulkan] Add image format qualifier to glsl files (#69330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69330 --- ## Context Previously, our shader files did not declare any [image format qualifiers](https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)#Image_formats) for image layouts. This causes the SPIR-V modules produced to declare the [StorageImageWriteWithoutFormat](https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_a_id_capability_a_capability) capability, which requires `shaderStorageImageWriteWithoutFormat` to be enabled in [VkPhysicalDeviceFeatures](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceFeatures.html). `shaderStorageImageWriteWithoutFormat` is not available on some devices, causing errors to be reported by the Vulkan validation layer. ## Changes Vulkan shaders now declare the image format explicitly so that the SPIR-V modules produced are compatible with devices that do not have `shaderStorageImageWriteWithoutFormat` enabled. Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840909 Pulled By: SS-JIA fbshipit-source-id: 76e0a0da68b423ebc74ae7e839b9cfaf57d2cd39	2021-12-07 16:23:09 -08:00
Jerry Zhang	e5a1ee0e5a	[quant][graphmode] Refactor fusion to use the new Pattern format (#68770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68770 Previous fusion only works for a sequnce of ops, which is not general enough for fusion patterns that is defined by a subgraph, this PR refactors that to make it more general Test Plan: ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32602637 fbshipit-source-id: a7897c62081b9d71c67fb56e78484cf68deaacf6	2021-12-07 16:12:40 -08:00
Richard Barnes	1433160a36	use irange for loops 6 (#66742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705366 fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401	2021-12-07 16:07:50 -08:00
Peter Bell	9a7732e852	CMake: Support dynamic codegen outputs (#68246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68246 Currently the codegen produces a list of output files at CMake configuration time and the build system has no way of knowing if the outputs change. So if that happens, you basically need to delete the build folder and re-run from scratch. Instead, this generates the output list every time the code generation is run and changes the output to be a `.cmake` file that gets included in the main cmake configuration step. That means the build system knows to re-run cmake automatically if a new output is added. So, for example you could change the number of shards that `Operators.cpp` is split into and it all just works transparently to the user. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32596268 Pulled By: albanD fbshipit-source-id: 15e0896aeaead90aed64b9c8fda70cf28fef13a2	2021-12-07 15:58:06 -08:00
Peter Bell	cd9da3267c	Rationalize API exports in torch_python (#68095 ) Summary: This renames `WindowsTorchApiMacro.h` to `Export.h` to mirror the c10 header `c10/macros/Export.h` and also updates it to use `C10_EXPORT`/`C10_IMPORT`. This also removes the `THP_API` macro from `THP_export.h` which appears to serve the same purpose. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68095 Reviewed By: jbschlosser Differential Revision: D32810881 Pulled By: albanD fbshipit-source-id: d6949ccd0d80d6c3e5ec1264207611fcfe2503e3	2021-12-07 15:24:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	829b49b867	Output UnionType str rep with () instead of [] (#69502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69502 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32902781 Pulled By: tugsbayasgalan fbshipit-source-id: 67a73b209575437477cdbd3eb8f685019709e99c	2021-12-07 14:17:06 -08:00
Ivan Yashchuk	a8232ee1bc	Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708 This PR adds block CSR matrix times dense vector multiplication. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32647694 Pulled By: cpuhrsch fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66	2021-12-07 14:02:59 -08:00
Cheng Tang	6df7b75186	skip ORT tensor in TensorIterator because it doesn't have storage (#68705 ) Summary: ORT Tensors are similar to XLA tensors which doesn't have storage. So extend the condition to ORT tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68705 Reviewed By: zou3519 Differential Revision: D32921378 Pulled By: albanD fbshipit-source-id: 3bda9bba2ddd95cb561a4d1cff463de652256708	2021-12-07 13:33:54 -08:00
Mike Iovine	008469c5e2	[SR] Simplify memory re-use algorithm (#68302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68302 Implement the new memory re-use algorithm. It’s roughly based on the c2 one, but after going through many iterations it may not be a 1:1 port anymore. Also deleted the old liveness analysis. Test Plan: ## Re-use metrics `inline_cvr` (294738512_58) Before * `local` ``` Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 4601984 bytes Total number of reused tensors: 1183 ``` * `local_ro` ``` Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 29696 bytes Total number of reused tensors: 959 ``` After * `local` ``` Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 4520000 bytes Total number of reused tensors: 1198 ``` * `local_ro` ``` Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 29120 bytes Total number of reused tensors: 963 ``` Reviewed By: hlu1 Differential Revision: D32370424 fbshipit-source-id: 06a8e0a295ed7a2b4d14071349c1f1e975f746bf	2021-12-07 13:25:42 -08:00
Andrey Talman	c309637923	Making cuda 11.5 workflows periodic (#69323 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69323 Reviewed By: gchanan, malfet Differential Revision: D32812346 Pulled By: atalman fbshipit-source-id: 081f40802997cfb986742f1621eee4b4565660f0	2021-12-07 13:14:07 -08:00
Nikita Shulga	baac51ff4a	Add conda-forge dependency for cuda-11.5 (#69541 ) Summary: [NVIDIA's cudatoolkit=11.5](https://anaconda.org/nvidia/cudatoolkit/files?version=11.5.0) at the time of the writing depends on libstdcxx-ng >=9.4.0, but latest available from official anaconda channel is [9.3.0](https://anaconda.org/anaconda/libstdcxx-ng/files?version=9.3.0), so add `-c conda-forge` as extra dependency to resolve the problem Should resolve problems such as https://app.circleci.com/pipelines/github/pytorch/pytorch/420750/workflows/19d6e3ce-a305-49c6-bac8-11ed43ed2b1e/jobs/16829102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69541 Reviewed By: atalman Differential Revision: D32921300 Pulled By: malfet fbshipit-source-id: 09dd3575f968679f545aec739a2791dde85d37c1	2021-12-07 12:58:41 -08:00
gmagogsfm	358e908162	Add Union type to TorchScript Language Ref (#69514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69514 Reviewed By: tugsbayasgalan Differential Revision: D32909371 Pulled By: gmagogsfm fbshipit-source-id: af1c3040cd59ee913dc576cf8a8c759313f1e07f	2021-12-07 12:53:54 -08:00
David Berard	c21169ea41	[JIT] optimize_for_inference on methods other than forward (#69367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32835529 Pulled By: davidberard98 fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf	2021-12-07 12:36:47 -08:00
David Berard	60ca6776e2	[JIT] run frozen optimizations on methods other than forward (#68668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68668 This updates run_frozen_optimizations so that it will run on additional methods other than forward ghstack-source-id: 143871758 Test Plan: Added test in test_freezing.py ``` python3 test/test_jit.py -- test_conv_bn_folding_not_forward ``` Reviewed By: eellison Differential Revision: D32567857 fbshipit-source-id: 75e56efad576404dc8d6897861d249573f5ccd7a	2021-12-07 12:35:30 -08:00
Kushashwa Ravi Shrimali	63470f9449	Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292 ) Summary: This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout. Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']` cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292 Reviewed By: pbelevich Differential Revision: D32805514 Pulled By: cpuhrsch fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3	2021-12-07 12:07:41 -08:00
Rodrigo Bermúdez Schettino	1a202b0c39	Docs: Fix broken code syntax in autograd.rst (#69362 ) Summary: The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block. Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362 Reviewed By: zou3519 Differential Revision: D32924093 Pulled By: albanD fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d	2021-12-07 12:03:15 -08:00
Yinghai Lu	10229e156b	trt engine inspector demo (#66683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66683 Starting from TensorRT 8.2, we have this nice engine inspector which gives you much details of trt layer. Test Plan: ``` buck run mode/opt -c python.package_style=inplace scripts/yinghai/test:trt_engine_inspector ``` And you will see something like ``` {"Layers": [{ "Name": "PWN(PWN(relu_1), add_1)", "LayerType": "PointWiseV2", "Inputs": [ { "Name": "x", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "Outputs": [ { "Name": "(Unnamed Layer* 1) [ElementWise]_output", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "ParameterType": "PointWise", "ParameterSubType": "PointWiseExpression", "NbInputArgs": 1, "InputArgs": ["arg0"], "NbOutputVars": 1, "OutputVars": ["var1"], "NbParams": 0, "Params": [], "NbLiterals": 4, "Literals": ["0.000000e+00f", "1.000000e+00f", "0.000000e+00f", "0.000000e+00f"], "NbOperations": 2, "Operations": ["const auto var0 = pwgen::iMax(arg0, literal0);", "const auto var1 = pwgen::iPlus(arg0, var0);"], "TacticValue": "0x0" },{ "Name": "matmul_1", "LayerType": "MatrixMultiply", "Inputs": [ { "Name": "(Unnamed Layer* 1) [ElementWise]_output", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }, { "Name": "y", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "Outputs": [ { "Name": "output0", "Dimensions": [10], "Format/Datatype": "Row major linear FP16 format" }], "ParameterType": "MatrixMultiply", "MatrixOpA": "VECTOR", "MatrixOpB": "VECTOR", "Alpha": 1, "Beta": 0, "TacticValue": "0x1" }], "Bindings": ["x" ,"y" ,"output0" ]} ``` Reviewed By: RoshanPAN, wushirong Differential Revision: D31681405 fbshipit-source-id: 31f912c37812ac17c6421073e0c35e512463ba6e	2021-12-07 11:50:09 -08:00
David Berard	aa9fbb9ae9	[JIT] check stack size after calling operator (#68788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788 In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema) Test Plan: Run this in debug mode and verify that it doesn't throw an assert ``` import torch class Thing(torch.nn.Module): torch.jit.export def en(self, x: torch.Tensor): return torch.add(x, 2.0) def forward(self, x: torch.Tensor, y: torch.Tensor): a = torch.mm(x, y) b = torch.nn.functional.gelu(a) c = self.en(b) return c.std_mean() if __name__ == '__main__': unsc = Thing() thing = torch.jit.script(unsc) x = torch.randn(4, 4) y = torch.randn(4, 4) std, mean = thing.forward(x, y) print(std, mean) print(str(thing.forward.graph)) ``` Reviewed By: gchanan Differential Revision: D32625256 Pulled By: davidberard98 fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5	2021-12-07 11:43:50 -08:00
Kevin Tse	bd8d4195a6	[DataPipe] Small change to generation script and update to DataPipe .pyi file (#69392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69392 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849463 Pulled By: NivekT fbshipit-source-id: b6d419fbe0e4cc9d718f21fb3fe886f721f618d3	2021-12-07 11:40:53 -08:00
Kevin Tse	fdfdafd1e6	[DataPipe] Removing usage of unbatch_level from .batch interface and DataFrame (#69393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69393 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32849461 Pulled By: NivekT fbshipit-source-id: 16abbe289ad2092faaa029fd78f3d6924e7b2ff4	2021-12-07 11:40:50 -08:00
Kevin Tse	357160e68e	[DataPipe] Unifying API - removing nesting_level argument from FilterIterDataPipe (#69391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69391 As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `FilterIterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849462 Pulled By: NivekT fbshipit-source-id: 91cf1dc03dd3d3cbd7a9c6ccbd791ade91355f30	2021-12-07 11:40:46 -08:00
Kevin Tse	4478b14e4c	[DataPipe] Unifying API - removing nesting_level argument from MapperIterDataPipe (#69390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69390 As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `MapperIterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849465 Pulled By: NivekT fbshipit-source-id: 963ce70b84a7658331d126e5ed9fdb12273c8e1f	2021-12-07 11:39:08 -08:00
Jerry Zhang	9cb52327a8	[quant][refactor] Move pattern type definition to ao/quantization/utils.py (#68769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68769 att, since we want to use this type in fuser_method_mapping in later PRs Test Plan: no change to logic, just regression test on ci ``` python test/test_quantization.py ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32602636 fbshipit-source-id: 15b95241431dfca9b1088d0920bf75705b37aa9a	2021-12-07 11:00:22 -08:00
Hanton Yang	976b076715	[iOS] Add LibTorch nightly build (#69341 ) Summary: Add LibTorch nightly build for using in LibTorchvision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69341 Test Plan: CI jobs: https://fburl.com/lbyjzpxz 1. Validate lib is uploaded to link https://ossci-ios-build.s3.amazonaws.com/libtorch_ios_nightly_build.zip 2. Download lib from the link and validate `version.txt` is correct 3. Test the lib in HelloWorld demo Imported from OSS Reviewed By: xta0 Differential Revision: D32901836 Pulled By: hanton fbshipit-source-id: 8622c3e6052cec2039bc15dea0d495ec1a8186cb	2021-12-07 10:07:28 -08:00
Scott Wolchok	3edf1b6cee	[PyTorch] Avoid no-op shared_ptr dtor when constructing tuple (#69337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69337 See note in code. ghstack-source-id: 144657751 Test Plan: Ran PyTorchFeatureConversionBenchmark 5x before/after: ``` swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2CacheTupleTypes ; end (pytorch-ort-bert) ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 418.75K PyTorchFeatureConversionIdListBenchmark 3.59us 278.91K PyTorchFeatureConversionIdScoreListBenchmark 5.01us 199.51K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.42us 413.80K PyTorchFeatureConversionIdListBenchmark 3.56us 280.60K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.15K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.41us 414.25K PyTorchFeatureConversionIdListBenchmark 3.55us 281.59K PyTorchFeatureConversionIdScoreListBenchmark 5.02us 199.09K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 417.68K PyTorchFeatureConversionIdListBenchmark 3.55us 281.65K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.06K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 417.54K PyTorchFeatureConversionIdListBenchmark 3.56us 281.03K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.13K ============================================================================ swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2TupleConstruction ; end (pytorch-ort-bert) ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 420.38K PyTorchFeatureConversionIdListBenchmark 3.53us 282.90K PyTorchFeatureConversionIdScoreListBenchmark 4.99us 200.41K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.37us 421.54K PyTorchFeatureConversionIdListBenchmark 3.54us 282.27K PyTorchFeatureConversionIdScoreListBenchmark 4.99us 200.28K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 420.99K PyTorchFeatureConversionIdListBenchmark 3.56us 280.56K PyTorchFeatureConversionIdScoreListBenchmark 5.08us 196.91K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.37us 421.48K PyTorchFeatureConversionIdListBenchmark 3.54us 282.87K PyTorchFeatureConversionIdScoreListBenchmark 5.00us 199.88K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 419.69K PyTorchFeatureConversionIdListBenchmark 3.56us 280.68K PyTorchFeatureConversionIdScoreListBenchmark 4.97us 201.23K ============================================================================ ``` Looks like maybe around 1% faster? Reviewed By: hlu1 Differential Revision: D32817592 fbshipit-source-id: 4b015dc993b26a92e45a3673e14fde32105a34fa	2021-12-07 09:39:15 -08:00
Jane Xu	617a3bd944	GHA: Re enable mac json uploads (#69387 ) Summary: Removed JSON uploading to S3 for Mac GHA workflows as the AWS credentials were not working. This PR tries uploading them to GitHub instead, which works https://github.com/pytorch/pytorch/runs/4413940318?check_suite_focus=true They should show up on the HUD page: hud.pytorch.org/pr/69387 with the name test-jsons after the CI is completed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69387 Reviewed By: seemethere Differential Revision: D32885204 Pulled By: janeyx99 fbshipit-source-id: 3d25ead6d464144a228fdf8ead5172de3ed8430e	2021-12-07 08:25:51 -08:00
CodemodService FBSourceClangFormatLinterBot	945d2e380c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32910817 fbshipit-source-id: 60d0cb10412e1a37a0249bb223b75855c5596dbd	2021-12-07 08:11:09 -08:00
Bryan Reese	4670f0f2c5	Set non-default backend names to lower case (#69400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69400 Hopefully this makes naming more consistent. Without this change, some tests will fail for plugins since values can be set to upper case in some cases. This should prevent that and make lookup and comparison consistent. Test Plan: Check the signals. There is no specific test for this, but all tests should pass. Reviewed By: mrshenli Differential Revision: D32836529 fbshipit-source-id: 1b7d2b64e04fe0391b710aa6ed6d1e47df9027a3	2021-12-07 07:58:46 -08:00
Vasiliy Kuznetsov	2dd46d3aa9	FX: ensure node stack trace survives copying (#69368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69368 Before this PR, copying a node would lose the stack trace. This PR ensures that the stack trace is preserved across copies. This is useful because quantization passes would like to start allowing the user to preserve stack traces, and we use the copy behavior. Test Plan: ``` python test/test_fx.py TestFX.test_stack_traces ``` Imported from OSS Reviewed By: jamesr66a Differential Revision: D32835248 fbshipit-source-id: 91610fd8d05f5683cfa5e11fb6f9f3feacb8e241	2021-12-07 06:18:38 -08:00
Jerry Zhang	ca945d989a	[quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249 This PR added default_replay_qconfig and default_replay_observer which is used when we want to configure an operator to reuse the observer from input, if the input Tensor for the operator is not observed, we will not observe the output of this operator either, if the input Tensor is observed, we will observe the output of the operator with the same observer. e.g. ``` x1 = x0.reshape() ``` if reshape is configured with default_replay_qconfig: 1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance 2. if x0 is not observed, we won't observe x1 either Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_replay_qconfig ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32774723 fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2	2021-12-06 22:56:14 -08:00
David Berard	8b1e49635a	[JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32773666 Pulled By: davidberard98 fbshipit-source-id: c83dbb88804bdef23dc60a6299acbfa76d5c1495	2021-12-06 21:06:25 -08:00
Nikita Shulga	e55b939732	Enable build-split for all CUDA-11.x version (#69494 ) Summary: Should fix cu115 wheel binary builds, see https://hud.pytorch.org/ci/pytorch/pytorch/nightly?name_filter=cu115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69494 Reviewed By: atalman Differential Revision: D32899994 Pulled By: malfet fbshipit-source-id: bb0e05a30c9360c75d2cfd9d4e0d40ed9a3b2830	2021-12-06 20:39:06 -08:00
Jerry Zhang	bd8a4a9372	[wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op (#68229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68229 This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we rely on subgraph_rewriter to rewrite the reference qunatized patterns to quantized ops Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537714 fbshipit-source-id: 456086b308c4446840d8d37997daa6f8f8068479	2021-12-06 20:20:15 -08:00
Shiyan Deng	bcd0303834	[fx2trt][easy] add sparse flag to TRTInterpreter (#69495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69495 As the title. Separated from D30589161. Test Plan: Tested in D30589161. Reviewed By: maratsubkhankulov, wushirong Differential Revision: D32898927 fbshipit-source-id: 89e18d2eb19b43fbab92b4988d0a21d21cff2d1f	2021-12-06 18:57:08 -08:00
Shiyan Deng	3211588308	[fx2trt] Separate sign from `trunc_div` and use it for acc_ops.sign (#69486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69486 As the title. Migrate from sign plugin to native trt layers. All the layers are fused into one single PWN kernel in TRT. ``` [TensorRT] VERBOSE: Engine Layer Information: Layer(PointWiseV2): PWN(sign_1_sign_rhs + sign_1_sign_rhs_broadcast, PWN(PWN(sign_1_floor_div2_rhs + sign_1_floor_div2_rhs_broadcast, PWN(PWN(PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_abs], [UNARY]-[acc_ops.sign]-[sign_1_prod_abs_exp]), PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_exp], [ELEMENTWISE]-[acc_ops.sign]-[sign_1_exp_floor_div])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_floor_div*2])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_sign])), Tactic: 0, x[Float(2,2,3)] -> output0[Float(2,2,3)] ``` Test Plan: CI Reviewed By: wushirong Differential Revision: D32887537 fbshipit-source-id: ac250b5197e340319de29653a27f879a0e1ea9cd	2021-12-06 16:54:44 -08:00
Shiyan Deng	e23827e6d6	[fx2trt] [Prep for release] Add type hints to converters and separate main files (#69458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69458 1. Added type hints to acc ops converters. 2. Put some of the class/logic in fx2trt.py to some separated files. (input_tensor_spec.py, trt_module.py, converter_registry.py). 3. Added import in `__init__.py` so that user can just call `from torch.fx.experimental.fx2trt import xxx` instead of `experimental.fx2trt.fx2trt`. Test Plan: CI Reviewed By: wushirong Differential Revision: D32884637 fbshipit-source-id: e3e1e597edb9a08b47b4595bd371f570f2f3c9b6	2021-12-06 16:54:41 -08:00
Shiyan Deng	a2d1cadfdb	[fx2trt] Add a helper function to generate specs for dynamic batch size (#69405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69405 Add a helper function that will generate input tensor specs with dynamic batch size. Note that the constraint currently on this function is that the batch dimension of all these tensors should be the first dimension. Also add more doc strings. Test Plan: Added unit tests. ``` Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299413036896 ✓ ListingSuccess: caffe2/test/fx2trt/core:test_input_tensor_spec - main (7.455) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensor (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.047) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors_with_dynamic_batch_size (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.066) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.181) Summary Pass: 3 ListingSuccess: 1 ``` Wait for CI to verify if this unit test can run without RE. Reviewed By: yinghai, kflu Differential Revision: D32853947 fbshipit-source-id: 19713e8ad5478c945385c7013f7a1b9894151fea	2021-12-06 16:54:39 -08:00
Shiyan Deng	cfe3cbb392	[fx2trt] Use weights shape as normalize shape in layer norm (#69401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69401 As the title. In PyTorch, these two shapes are the same. Normalize shape might be retrieved from tensor.size and in explicit batch dim, it won't work right now. Test Plan: ``` ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_layernorm - main (7.018) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (22.945) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (23.203) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (42.549) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (43.544) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (45.958) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (47.027) Summary Pass: 6 ListingSuccess: 1 ``` Reviewed By: yinghai Differential Revision: D32853359 fbshipit-source-id: 8a122fe3348a1d9ad07b48647ec6166d171d113a	2021-12-06 16:53:29 -08:00
Michael Suo	59e98b66ac	Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd Test Plan: revert-hammer Differential Revision: D32704467 (`e032dae329`) Original commit changeset: 6eea1cce6b93 fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2	2021-12-06 16:33:32 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	bc89528931	Initialize upgrader and operator version files (#68772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68772 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D32603257 Pulled By: tugsbayasgalan fbshipit-source-id: 5a3d9ba4d0a01ddff4ff6ebdf7bb88ec125765b0	2021-12-06 16:27:52 -08:00
Jacob Szwejbka	9e678446a2	[Pytorch Edge] Add new_empty_strided to tracer (#69492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69492 We already add empty, and this is another weird variation that sometimes pops up. Triggering it is unclear, so just adding it for now. Test Plan: ran tracer Differential Revision: D32896522 fbshipit-source-id: 38627d8efc48ef240100ccdbd94c0e7208b0b466	2021-12-06 15:28:13 -08:00
Junjie Wang	65b0f389d2	[PyTorch][Distributed] Use auto-grad enabled collections for the shared linear op to enable backward grad calculation (#68096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68096 We replace all c10d APIs with the Auto-grad collection in the shareded linear op. So that we can enable the backward propagation (grad calculation for sharded linear). ghstack-source-id: 144882914 Test Plan: Unit test + CI Reviewed By: pritamdamania87 Differential Revision: D32177341 fbshipit-source-id: 1919e8ca877bdc79f4cdb0dc2a82ddaf6881b9f1	2021-12-06 15:17:08 -08:00
Junjie Wang	7c2489bdae	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68786 To enable the auto grad for the sharded linear, we find we need to make some changes to the current nn function api (c10d api with auto grad enabled). So we made the following several changes: 1. Add a new api `reduce_scatter` since we need it in the rowwise sharding. 2. Modify the `all_to_all` api to make sure it consistent with the ones in distributed_c10d.py. 3. Found the cpp input params of `reduce_scatter` is missing input param, added more unit test to cover these cases. 4. Sync the NN test from gloo to nccl. ghstack-source-id: 144860208 Test Plan: CI + Unit Test Reviewed By: pritamdamania87 Differential Revision: D32569674 fbshipit-source-id: 9bd613f91bbf7a39eede0af32a5a5db0f2ade43b	2021-12-06 13:38:58 -08:00
Rohan Varma	e032dae329	[Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027 Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added the following tests: -[ ] Gradient hooks are called once ghstack-source-id: 144644859 Test Plan: CI Reviewed By: pbelevich Differential Revision: D32704467 fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412	2021-12-06 13:29:37 -08:00
Deepali Chourasia	4d81175a07	add VSX dispatch for fft_fill_with_conjugate_symmetry_stub (#68914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68057. As discussed in https://github.com/pytorch/pytorch/issues/68057 adding change to provide the missing dispatch for VSX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68914 Reviewed By: seemethere Differential Revision: D32696773 Pulled By: malfet fbshipit-source-id: f1b70ab85bf9fb1c0119cc70d6125b8801d95669	2021-12-06 13:04:59 -08:00
Eli Uriegas	f87faf3c29	.github: Volume mount local netrc for docs push (#69472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69472 Neglected the fact that the actual push for these variables is happening inside of a docker container, this should help resolve that issue Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32889583 Pulled By: seemethere fbshipit-source-id: d0ef213787694ab1a7e9fb508c58d2f53ff218c3	2021-12-06 12:11:23 -08:00
Rohan Varma	1859e5f000	[FSDP] Enforce wrapper_cls as a mandatory kwarg in enable_wrap. (#69358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69358 Enforces and raises error earlier if wrapper_cls is not provided as an arg into enable_wrap() function. Also improves the documentation. ghstack-source-id: 144807950 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32826963 fbshipit-source-id: d1b98df021e86d3d87a626e82facf6230b571a55	2021-12-06 12:11:20 -08:00
Rohan Varma	00245fed96	[FSDP] Kill config_auto_wrap_policy, remove policy from enable_wrap, (#69357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69357 Since we only want to support enable_wrap() and wrap() manual wrapping APIs without them accepting auto_wrap_policy, remove all this unneeded code. ghstack-source-id: 144807951 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32826318 fbshipit-source-id: 6526e700ebdf132cbb10439698f5c97ce083cd3d	2021-12-06 12:11:17 -08:00
Rohan Varma	c95277e92a	[FSDP] Remove auto_wrap() (#69356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69356 Per title ghstack-source-id: 144807949 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32816150 fbshipit-source-id: 6b4eacc63edd267bc1eb8a1c1d6c753bc581d63a	2021-12-06 12:11:14 -08:00
Rohan Varma	f333cde14e	[FSDP] Make recursive_wrap, wrap APIs independent of ConfigAutoWrap. (#68776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68776 Makes these APIs independent of ConfigAutoWrap so that they can be used by FSDP ctor without it knowing about ConfigAutoWrap. Also gets us one step closer to killing ConfigAutoWrap.recursive_wrap and auto_wrap(), as we will only support enable_wrap() and wrap() moving forward. Will test via unittests and FSDP benchmarks to ensure the wrapping still works. ghstack-source-id: 144807948 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32604021 fbshipit-source-id: 54defc0cd90b16b5185a8c1294b39f75c06ffd21	2021-12-06 12:09:49 -08:00
Marat Subkhankulov	456139d0ae	FX pass: fuse_sparse_matmul_add (#69340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69340 - An FX pass to fuse ops resulting from addmm(a, b.t()) - Used to enable structured sparsity using TRT Reviewed By: 842974287 Differential Revision: D32456684 fbshipit-source-id: 601826af216cea314ee85ed522d5c54a5151d720	2021-12-06 12:07:02 -08:00
Sangbaek Park	68b5c86e65	[Vulkan] Implement slice operator (#69382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69382 Implemented `slice` operator on the Vulkan backend: * Supports only <= 4D tensors. * `aten::slice.Tensor` will be executed internally by indexing Tensor. * Slicing means selecting the elements present in the tensor by using `:` slice operator. We can slice the elements by using the index of that particular element. * Indexing starts with 0. `end` is exclusive. In this example, we will be getting the elements from the very start to the end index 4(exclusive) of the tensor. ``` tensor = torch.tensor([2, 4, 1, 7, 0, 9]) print(tensor[ : 4]) # Outputs- tensor([2, 4, 1, 7]) ``` * Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally. * Negative `start` and `end` inputs are allowed. * CPU implementation: [/aten/src/ATen/native/TensorShape.cpp::slice()](`3e45739543/aten/src/ATen/native/TensorShape.cpp (L1262)`) * For width dimension, use `vkCmdCopyImage` API, * input texture size = `{x,y,z}` * if `step` is 1, copy a region from the input texture to the output texture once where * source offset = `{start,0,0}` * destination offset = `{0,0,0}` * copy extents = `{end-start,y,z}` * call `vkCmdCopyImage` API * if `step` is not 1, do for-loop from x=`start` to `end-1` by `step` (also from x_new=`0` to `end-start-1`) where * x_max = x * copy extents = `{1,y,z}` * if (x >= x_max) continue; // out of range * source offset = `{x,0,0}` * destination offset = `{x_new,0,0}` * call `vkCmdCopyImage` API * For height dimension, use `vkCmdCopyImage` API, * input texture size = `{x,y,z}` * if `step` is 1, copy a region from the input texture to the output texture once where * source offset = `{0,start,0}` * destination offset = `{0,0,0}` * copy extents = `{x,end-start,z}` * call `vkCmdCopyImage` API * if `step` is not 1, do for-loop from y=`start` to `end-1` by `step` (also from y_new=`0` to `end-start-1`) where * y_max = y * copy extents = `{x,1,z}` * if (y >= y_max) continue; // out of range * source offset = `{0,y,0}` * destination offset = `{0,y_new,0}` * call `vkCmdCopyImage` API * For batch and feature(channel) dimensions, we build up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access) * `b,c,h,w` = input tensor dims (NCHW) * `b1,c1,h1,w1` = output tensor dims (NCHW) * `posIn` = position (x,y,z) for input texture * `posOut` = position (x,y,z) for output texture * `inval` = input texel value * `outval` = output texel value * `max_dst_index` = batch size of output tensor * channel size of output tensor * `n` = end - start * `i` = index of input texel (0...3) and `j` = index of output texel (0..3) * Pseudo code: ``` for (uint j = 0; j < 4; ++j) { dst_index = posOut.z * 4 + j; if (dst_index >= max_dst_index) { save outval to output texture at posOut break; // out of reange } b1 = int(dst_index / channel size of output tensor); c1 = dst_index % channel size of output tensor; h1 = posOut.y; w1 = posOut.x; b=b1 c=c1 h=h1 w=w1 if (dim==0) { // batch b=start+stepb1; } else { // feature(channel) c=start+stepc1 } src_index = b * channel size of input tensor + c; posIn.x = int(w); posIn.y = int(h); posIn.z = int(src_index / 4); i = (src_index % 4); read inval from input texture at posIn outval[j] = inval[i] if (j == 3) { save outval to output texture at posOut } } ``` * Error/edge cases: * Vulkan backend doesn't support zero-sized slice. It throws an exception when allocating a Vulkan buffer if any dim size is zero. * The slice step should be positive. * Generalized test cases with different dim size tensors for batch, feature, height and width. For example, a 4D tensor slicing by dim=width: ``` tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=1 <-> tensor indexing by [:,:,:,10:30:1] tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=7 <-> tensor indexing by [:,:,:,10:30:7] tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=50, step=2 <-> tensor indexing by [:,:,:,10:50:2] with end=out of range tensor {2, 3, 40, 50} slicing with dim=3, start=-60, end=60, step=2 <-> tensor indexing by [:,:,:,-60:60:2] with start/end=out of range tensor {2, 3, 40, 50} slicing with dim=3, start=-30, end=-10, step=2 <-> tensor indexing by [:,:,:,-30:-10:1] with negative start/end tensor {2, 3, 40, 50} slicing with dim=3, start=0, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,0:9223372036854775807:1] with end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=-10, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-10:9223372036854775807:1] with negative start and end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=INT64_MIN, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-9223372036854775808:9223372036854775807:1] with start=INT64_MIN and end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=empty, end=empty, step=2 <-> tensor indexing by [:,:,:,::1] with empty start/end ``` * References: * [Slicing PyTorch Datasets](https://lewtun.github.io/blog/til/nlp/pytorch/2021/01/24/til-slicing-torch-datasets.html) * [How to Slice a 3D Tensor in Pytorch?](https://www.geeksforgeeks.org/how-to-slice-a-3d-tensor-in-pytorch/) * [PyTorch Tensor Indexing API](https://pytorch.org/cppdocs/notes/tensor_indexing.html#translating-between-python-c-index-types) * [PyTorch Tensor Indexing](https://deeplearninguniversity.com/pytorch/pytorch-tensor-indexing/) * [Slicing and Striding](https://mlverse.github.io/torch/articles/indexing.html#slicing-and-striding) * Vulkan `slice` operator tensor conversion: {F684363708} Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (17 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (13 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (20 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (9 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (0 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (81 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (56 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (132 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (33 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (1 ms) ``` Reviewed By: SS-JIA Differential Revision: D32482638 fbshipit-source-id: 65841fb2d3489ee407f2b4f38619b700787d41b0	2021-12-06 12:05:37 -08:00
Natalia Gimelshein	a84ed8be6d	unify compare kernels (#69111 ) Summary: This unifies 6 compare ops (NE, EQ, LT, LE, GE, GT) into 2 kernels, reducing context size. Performance is ~5% worse for low width broadcasted cases, on-par for non-broadcasted With this PR, benchmarks for contiguous, 1M-MM, 1M-M1, op with scalar (size in MB and bandwidth in GB/s): ``` 5.0, 795.9 10.0, 650.5 15.0, 706.2 20.0, 731.6 25.0, 744.9 30.0, 758.1 35.0, 762.6 40.0, 768.8 45.0, 775.7 50.0, 780.7 55.0, 781.7 60.0, 783.0 65.0, 784.8 70.0, 790.7 75.0, 789.2 80.0, 794.4 85.0, 794.2 90.0, 797.4 95.0, 796.3 100.0, 798.0 3.0, 363.7 1.0, 122.2 3.0, 385.5 6.0, 420.4 2.0, 142.9 6.0, 755.5 9.0, 438.3 3.0, 151.6 9.0, 684.5 12.0, 449.5 4.0, 156.4 12.0, 702.9 15.0, 463.7 5.0, 159.6 15.0, 716.8 18.0, 472.7 6.0, 161.4 18.0, 737.0 21.0, 477.6 7.0, 162.4 21.0, 745.6 24.0, 480.9 8.0, 164.1 24.0, 755.4 27.0, 483.7 9.0, 163.7 27.0, 760.7 30.0, 487.3 10.0, 165.9 30.0, 770.4 33.0, 491.4 11.0, 166.3 33.0, 774.3 36.0, 492.9 12.0, 166.2 36.0, 779.0 39.0, 494.7 13.0, 166.7 39.0, 782.5 42.0, 491.3 14.0, 166.7 42.0, 789.0 45.0, 495.1 15.0, 167.5 45.0, 790.0 48.0, 499.7 16.0, 167.7 48.0, 791.8 51.0, 496.2 17.0, 166.9 51.0, 794.0 54.0, 497.6 18.0, 167.7 54.0, 797.4 57.0, 497.1 19.0, 167.5 57.0, 798.6 60.0, 498.8 20.0, 168.8 60.0, 802.1 ``` Master ``` 5.0, 743.4 10.0, 665.7 15.0, 702.3 20.0, 727.5 25.0, 740.7 30.0, 757.5 35.0, 760.3 40.0, 768.5 45.0, 775.7 50.0, 776.8 55.0, 781.1 60.0, 786.5 65.0, 786.8 70.0, 790.1 75.0, 789.7 80.0, 789.1 85.0, 793.2 90.0, 793.8 95.0, 795.9 100.0, 796.0 3.0, 383.1 1.0, 129.0 3.0, 337.0 6.0, 445.0 2.0, 149.6 6.0, 670.6 9.0, 445.3 3.0, 159.6 9.0, 678.6 12.0, 474.9 4.0, 164.1 12.0, 705.5 15.0, 480.8 5.0, 167.2 15.0, 718.3 18.0, 490.3 6.0, 169.1 18.0, 733.3 21.0, 493.9 7.0, 168.5 21.0, 742.5 24.0, 503.8 8.0, 171.9 24.0, 756.4 27.0, 506.7 9.0, 171.3 27.0, 759.8 30.0, 508.7 10.0, 172.4 30.0, 767.1 33.0, 515.7 11.0, 174.2 33.0, 773.7 36.0, 516.7 12.0, 170.4 36.0, 781.7 39.0, 519.1 13.0, 174.4 39.0, 782.1 42.0, 515.7 14.0, 174.1 42.0, 787.0 45.0, 519.2 15.0, 172.7 45.0, 788.1 48.0, 522.2 16.0, 175.4 48.0, 791.7 51.0, 519.6 17.0, 175.1 51.0, 795.7 54.0, 518.5 18.0, 174.8 54.0, 795.8 57.0, 519.1 19.0, 174.4 57.0, 796.6 60.0, 521.5 20.0, 175.6 60.0, 800.1 ``` <details> <summary>Benchmarking script </summary> ``` import torch from matplotlib import pyplot as plt from torch.utils.benchmark import Timer, Compare import math import click print(torch.cuda.get_device_capability()) # check that we are on Volta (compute capability 7,0) #torch.cuda.set_device(1) # don't benchmark on anything too small, you'll see only overhead click.command() click.option('--op_str', default="torch.gt") click.option('--dtype_str', default="float", type=click.Choice(['float', 'half'])) def bench(op_str, dtype_str): if dtype_str == "float": dtype = torch.float elif dtype_str == "half": dtype = torch.half MB = 1024 * 1024 size = MB results = [] sizes = [] for _ in range(20): torch.cuda.memory.empty_cache() a=torch.randn(int(size), device="cuda", dtype=dtype) b=torch.randn(int(size), device="cuda", dtype=dtype) t = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{size/MB} MB", description="contiguous", globals = {"a":a, "b":b}) res = t.blocked_autorange() results.append(res) sizes.append(size) size += MB del a #to save memory for next iterations del b c=Compare(results) #print(c) bw=[] bytes=[] element_size = torch.tensor([], dtype=dtype).element_size() output_element_size = 1 for res, size in zip(results,sizes): bytes_io = 2sizeelement_size + output_element_size * size bytes.append(bytes_io/MB) # we'll report bandwidth in GB/s bw.append(bytes_io/res.median * 1e-9) print(f"{bytes_io/MB:7.1f}, {bw[-1]:7.1f}") sizes = [] results = [[],[],[]] size = MB for _ in range(20): torch.cuda.memory.empty_cache() M = math.floor(math.sqrt(size)) a=torch.randn(1, M, device="cuda", dtype=dtype) b=torch.randn(M, M, device="cuda", dtype=dtype) b1 = torch.randn(M, 1, device="cuda", dtype=dtype) tb = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{MM/MB} MB", description="MMM1", globals = {"a":a, "b":b}) t1 = Timer(stmt=f"{op_str}(a,b1)", label = op_str, sub_label=f"{MM/MB} MB", description="M11M", globals = {"a":a, "b1":b1}) ts = Timer(stmt=f"{op_str}(b,1.)", label = op_str, sub_label=f"{MM/MB} MB", description="scalar", globals = {"a":a, "b":b}) res = [t.blocked_autorange() for t in (tb, t1, ts)] for (rl, r) in zip(results, res): rl.append(r) sizes.append(M) size += MB del a #to save memory for next iterations del b comps = [Compare(r) for r in results] #[print(c) for c in comps] bw=[[],[],[]] for res, res1, ress, size in zip(results[0],results[1],results[2], sizes): bytes_io = (size+sizesize)element_size + output_element_size sizesize #(size+size+sizesize)4 bytes_io1 = (size+size)element_size + output_element_size * sizesize #(size+size+sizesize)4 bytes_ios = (sizesize)element_size + output_element_size size * size bytes_iol = (bytes_io, bytes_io1, bytes_ios) for (bw_elem, bytes_elem, res_elem) in zip(bw, bytes_iol, (res, res1, ress)): bw_elem.append(bytes_elem/res_elem.median * 1e-9) print(f"{bytes_iol[0]/MB:7.1f}, {bw[0][-1]:7.1f}", f"{bytes_iol[1]/MB:7.1f}, {bw[1][-1]:7.1f}", f"{bytes_iol[2]/MB:7.1f}, {bw[2][-1]:7.1f}") if __name__ == '__main__': bench() ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/69111 Reviewed By: mruberry Differential Revision: D32851098 Pulled By: ngimel fbshipit-source-id: cfb83922b2e8eb6a0ad0621ff07c2dada9c8e626	2021-12-06 11:00:53 -08:00
Eli Uriegas	38c576cfef	Clean up CODEOWNERS for .github/ (#69395 ) Summary: Cleans up the CODEOWNERS file to reflect current team Pull Request resolved: https://github.com/pytorch/pytorch/pull/69395 Test Plan: yeah_sandcastle Reviewed By: anjali411 Differential Revision: D32885237 Pulled By: seemethere fbshipit-source-id: a465f2cd0e27d5e53f5af5769d1cad47ec5348e7	2021-12-06 10:50:29 -08:00
Peter Bell	bf01cd5228	Move THC_sleep to ATen (#69038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69038 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872479 Pulled By: ngimel fbshipit-source-id: 97c7592b16eee2ecc66c42507c358aa92cc8ee50	2021-12-06 10:20:43 -08:00
Mike Ruberry	a974699633	Skips failing ROCm test (#69456 ) Summary: ROCm and CUDA type promotion are slightly divergent and need to be updated. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/69456 Reviewed By: anjali411, janeyx99 Differential Revision: D32883895 Pulled By: mruberry fbshipit-source-id: 3b0ba8a9d092c2d7ff20d78da42d4a147b1db12d	2021-12-06 09:12:31 -08:00
kshitij12345	b737e09f60	expose return_types in Python (#66614 ) Summary: https://github.com/facebookresearch/functorch/issues/87 TODO: * [x] Add comments * [x] Add test * [x] Fix XLA <details> <summary>Generated python_return_types.cpp</summary> ```cpp #include <Python.h> #include <vector> #include <map> #include <string> #include "torch/csrc/autograd/python_return_types.h" #include "torch/csrc/utils/structseq.h" #include "torch/csrc/Exceptions.h" namespace { PyTypeObject* get__det_lu_based_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""}, {nullptr} }; static PyTypeObject _det_lu_based_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc); _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_det_lu_based_helperNamedTuple; } PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc); _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; } PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc); _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fused_moving_avg_obs_fq_helperNamedTuple; } PyTypeObject* get__lu_with_info_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""}, {nullptr} }; static PyTypeObject _lu_with_infoNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc); _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_lu_with_infoNamedTuple; } PyTypeObject* get__unpack_dual_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""}, {nullptr} }; static PyTypeObject _unpack_dualNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc); _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_unpack_dualNamedTuple; } PyTypeObject* get_aminmax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmaxNamedTuple, &desc); aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmaxNamedTuple; } PyTypeObject* get_aminmax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc); aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmax_outNamedTuple1; } PyTypeObject* get_cummax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummaxNamedTuple, &desc); cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummaxNamedTuple; } PyTypeObject* get_cummax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummax_outNamedTuple1, &desc); cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummax_outNamedTuple1; } PyTypeObject* get_cummin_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cumminNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cumminNamedTuple, &desc); cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cumminNamedTuple; } PyTypeObject* get_cummin_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummin_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummin_outNamedTuple1, &desc); cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummin_outNamedTuple1; } PyTypeObject* get_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eig_outNamedTuple, &desc); eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eig_outNamedTuple; } PyTypeObject* get_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eigNamedTuple1, &desc); eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eigNamedTuple1; } PyTypeObject* get_frexp_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexpNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexpNamedTuple, &desc); frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexpNamedTuple; } PyTypeObject* get_frexp_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexp_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexp_outNamedTuple1, &desc); frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexp_outNamedTuple1; } PyTypeObject* get_geqrf_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrf_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrf_outNamedTuple, &desc); geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrf_outNamedTuple; } PyTypeObject* get_geqrf_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrfNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrfNamedTuple1, &desc); geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrfNamedTuple1; } PyTypeObject* get_histogram_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogram_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogram_outNamedTuple, &desc); histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogram_outNamedTuple; } PyTypeObject* get_histogram_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogramNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogramNamedTuple1, &desc); histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogramNamedTuple1; } PyTypeObject* get_kthvalue_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalueNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalueNamedTuple, &desc); kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalueNamedTuple; } PyTypeObject* get_kthvalue_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalue_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc); kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalue_outNamedTuple1; } PyTypeObject* get_linalg_cholesky_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc); linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_exNamedTuple; } PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc); linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_ex_outNamedTuple1; } PyTypeObject* get_linalg_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigNamedTuple, &desc); linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigNamedTuple; } PyTypeObject* get_linalg_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eig_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc); linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eig_outNamedTuple1; } PyTypeObject* get_linalg_eigh_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eighNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eighNamedTuple, &desc); linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eighNamedTuple; } PyTypeObject* get_linalg_eigh_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigh_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc); linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigh_outNamedTuple1; } PyTypeObject* get_linalg_inv_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc); linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_exNamedTuple; } PyTypeObject* get_linalg_inv_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc); linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_ex_outNamedTuple1; } PyTypeObject* get_linalg_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsqNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc); linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsqNamedTuple; } PyTypeObject* get_linalg_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsq_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc); linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsq_outNamedTuple1; } PyTypeObject* get_linalg_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qrNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qrNamedTuple, &desc); linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qrNamedTuple; } PyTypeObject* get_linalg_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qr_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc); linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qr_outNamedTuple1; } PyTypeObject* get_linalg_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc); linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdetNamedTuple; } PyTypeObject* get_linalg_slogdet_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdet_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc); linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdet_outNamedTuple1; } PyTypeObject* get_linalg_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc); linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svd_outNamedTuple; } PyTypeObject* get_linalg_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc); linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svdNamedTuple1; } PyTypeObject* get_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsq_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsq_outNamedTuple, &desc); lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsq_outNamedTuple; } PyTypeObject* get_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsqNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsqNamedTuple1, &desc); lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsqNamedTuple1; } PyTypeObject* get_lu_unpack_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpackNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpackNamedTuple, &desc); lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpackNamedTuple; } PyTypeObject* get_lu_unpack_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpack_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc); lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpack_outNamedTuple1; } PyTypeObject* get_max_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject maxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&maxNamedTuple, &desc); maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &maxNamedTuple; } PyTypeObject* get_max_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject max_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&max_outNamedTuple1, &desc); max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &max_outNamedTuple1; } PyTypeObject* get_median_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject medianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&medianNamedTuple, &desc); medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &medianNamedTuple; } PyTypeObject* get_median_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject median_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&median_outNamedTuple1, &desc); median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &median_outNamedTuple1; } PyTypeObject* get_min_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject minNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&minNamedTuple, &desc); minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &minNamedTuple; } PyTypeObject* get_min_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject min_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&min_outNamedTuple1, &desc); min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &min_outNamedTuple1; } PyTypeObject* get_mode_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject modeNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&modeNamedTuple, &desc); modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &modeNamedTuple; } PyTypeObject* get_mode_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject mode_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&mode_outNamedTuple1, &desc); mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &mode_outNamedTuple1; } PyTypeObject* get_nanmedian_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedianNamedTuple, &desc); nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedianNamedTuple; } PyTypeObject* get_nanmedian_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedian_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc); nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedian_outNamedTuple1; } PyTypeObject* get_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qr_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qr_outNamedTuple, &desc); qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qr_outNamedTuple; } PyTypeObject* get_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qrNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qrNamedTuple1, &desc); qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qrNamedTuple1; } PyTypeObject* get_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&slogdetNamedTuple, &desc); slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &slogdetNamedTuple; } PyTypeObject* get_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solveNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solveNamedTuple, &desc); solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solveNamedTuple; } PyTypeObject* get_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solve_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solve_outNamedTuple1, &desc); solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solve_outNamedTuple1; } PyTypeObject* get_sort_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sort_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sort_outNamedTuple, &desc); sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sort_outNamedTuple; } PyTypeObject* get_sort_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sortNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sortNamedTuple1, &desc); sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sortNamedTuple1; } PyTypeObject* get_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svd_outNamedTuple, &desc); svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svd_outNamedTuple; } PyTypeObject* get_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svdNamedTuple1, &desc); svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svdNamedTuple1; } PyTypeObject* get_symeig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeig_outNamedTuple, &desc); symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeig_outNamedTuple; } PyTypeObject* get_symeig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeigNamedTuple1, &desc); symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeigNamedTuple1; } PyTypeObject* get_topk_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topk_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topk_outNamedTuple, &desc); topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topk_outNamedTuple; } PyTypeObject* get_topk_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topkNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topkNamedTuple1, &desc); topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topkNamedTuple1; } PyTypeObject* get_triangular_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solve_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc); triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solve_outNamedTuple; } PyTypeObject* get_triangular_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solveNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc); triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solveNamedTuple1; } } namespace torch { namespace autograd { std::map<std::string, PyTypeObject>& get_namedtuple_types_map() { // [NOTE] Non-global map // This map calls Python functions during its initialization. // If it is a global static variable and in case it is loaded // before Python interpreter is ready, then the calls it makes during // initialization will SEGFAULT. // To avoid this we make it function static variable so that it is // initialized only after the Python interpreter is ready. static std::map<std::string, PyTypeObject> namedtuple_types_map = { {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()}, {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()}, {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()}, {"_lu_with_info", get__lu_with_info_namedtuple()}, {"_unpack_dual", get__unpack_dual_namedtuple()}, {"aminmax", get_aminmax_namedtuple()}, {"aminmax_out", get_aminmax_out_namedtuple()}, {"cummax", get_cummax_namedtuple()}, {"cummax_out", get_cummax_out_namedtuple()}, {"cummin", get_cummin_namedtuple()}, {"cummin_out", get_cummin_out_namedtuple()}, {"eig_out", get_eig_out_namedtuple()}, {"eig", get_eig_namedtuple()}, {"frexp", get_frexp_namedtuple()}, {"frexp_out", get_frexp_out_namedtuple()}, {"geqrf_out", get_geqrf_out_namedtuple()}, {"geqrf", get_geqrf_namedtuple()}, {"histogram_out", get_histogram_out_namedtuple()}, {"histogram", get_histogram_namedtuple()}, {"kthvalue", get_kthvalue_namedtuple()}, {"kthvalue_out", get_kthvalue_out_namedtuple()}, {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()}, {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()}, {"linalg_eig", get_linalg_eig_namedtuple()}, {"linalg_eig_out", get_linalg_eig_out_namedtuple()}, {"linalg_eigh", get_linalg_eigh_namedtuple()}, {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()}, {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()}, {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()}, {"linalg_lstsq", get_linalg_lstsq_namedtuple()}, {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()}, {"linalg_qr", get_linalg_qr_namedtuple()}, {"linalg_qr_out", get_linalg_qr_out_namedtuple()}, {"linalg_slogdet", get_linalg_slogdet_namedtuple()}, {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()}, {"linalg_svd_out", get_linalg_svd_out_namedtuple()}, {"linalg_svd", get_linalg_svd_namedtuple()}, {"lstsq_out", get_lstsq_out_namedtuple()}, {"lstsq", get_lstsq_namedtuple()}, {"lu_unpack", get_lu_unpack_namedtuple()}, {"lu_unpack_out", get_lu_unpack_out_namedtuple()}, {"max", get_max_namedtuple()}, {"max_out", get_max_out_namedtuple()}, {"median", get_median_namedtuple()}, {"median_out", get_median_out_namedtuple()}, {"min", get_min_namedtuple()}, {"min_out", get_min_out_namedtuple()}, {"mode", get_mode_namedtuple()}, {"mode_out", get_mode_out_namedtuple()}, {"nanmedian", get_nanmedian_namedtuple()}, {"nanmedian_out", get_nanmedian_out_namedtuple()}, {"qr_out", get_qr_out_namedtuple()}, {"qr", get_qr_namedtuple()}, {"slogdet", get_slogdet_namedtuple()}, {"solve", get_solve_namedtuple()}, {"solve_out", get_solve_out_namedtuple()}, {"sort_out", get_sort_out_namedtuple()}, {"sort", get_sort_namedtuple()}, {"svd_out", get_svd_out_namedtuple()}, {"svd", get_svd_namedtuple()}, {"symeig_out", get_symeig_out_namedtuple()}, {"symeig", get_symeig_namedtuple()}, {"topk_out", get_topk_out_namedtuple()}, {"topk", get_topk_namedtuple()}, {"triangular_solve_out", get_triangular_solve_out_namedtuple()}, {"triangular_solve", get_triangular_solve_namedtuple()}, }; return namedtuple_types_map; } PyTypeObject* get_namedtuple(std::string name) { static auto& namedtuple_types_map = get_namedtuple_types_map(); return namedtuple_types_map[name]; } void initReturnTypes(PyObject* module) { static struct PyModuleDef def = { PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}}; PyObject* return_types_module = PyModule_Create(&def); if (!return_types_module) { throw python_error(); } for (const auto& return_type_pair : get_namedtuple_types_map()) { // hold onto the TypeObject for the unlikely case of user // deleting or overriding it. Py_INCREF(return_type_pair.second); if (PyModule_AddObject( return_types_module, return_type_pair.first.c_str(), (PyObject)return_type_pair.second) != 0) { Py_DECREF((PyObject)return_type_pair.second); throw python_error(); } } // steals a reference to return_types on success if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) { Py_DECREF(return_types_module); throw python_error(); } } } // namespace autograd } // namespace torch ``` </details> <details> <summary>Eg. updated call in other python__functions</summary> ```cpp // linalg_cholesky_ex static PyObject THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex"); static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out"); static PythonArgParser parser({ "linalg_cholesky_ex(Tensor input, , bool upper=False, bool check_errors=False, TensorList[2] out=None)", }, /traceable=/true); ParsedArgs<4> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg"); } if (_r.isNone(3)) { // aten::linalg_cholesky_ex(Tensor self, , bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info) auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex(self, upper, check_errors); }; return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2))); } else { // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info) auto out = _r.tensorlist_n<2>(3); auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors); }; return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2))); } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614 Reviewed By: H-Huang Differential Revision: D32741134 Pulled By: zou3519 fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59	2021-12-06 09:05:29 -08:00
Will Constable	78b7a419b2	Enable native_dropout/backward for lazy (#69374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69374 Enables existing native_dropout operator for use with lazy tensors. Also adds aten interned strings so lazy tensor codegen can refer to the symbols in generated IR classes. Test Plan: CI for regressions of existing use cases, and manual tests of new Lazy Tensor functionality Reviewed By: ngimel Differential Revision: D32837301 fbshipit-source-id: a372a24ec65367fb84ad2e97c7e38cae4ec703a6	2021-12-06 08:14:10 -08:00
Mike Ruberry	b6f41bb848	The Jiterator (#69439 ) Summary: This PR: - creates the "jiterator" pattern, allowing elementwise unary and binary kernels that don't accept scalars to be jit compiled when called - ports the gcd and i1 CUDA kernels to use the jiterator - extends elementwise binary systemic testing to be comparable to elementwise unary systemic testing - separates one test case from test_out in test_ops.py - updates more OpInfos to use expected failures instead of skips The jiterator currently does not support half, bfloat16 or complex dtypes. It also (as mentioned above) doesn't support scalar inputs. In the future we expect to add support for those datatypes and scalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69439 Reviewed By: ngimel Differential Revision: D32874968 Pulled By: mruberry fbshipit-source-id: d44bb9cde4f602703e75400ec5a0b209f085e9b3	2021-12-06 07:32:48 -08:00
Tao Xu	3202028ed1	[Core ML] Avoid recompiling models when the OS version is not changed (#69438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69438 We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model. {F683788183} ghstack-source-id: 144784720 ghstack-source-id: 144821734 Test Plan: 1. Test in the playground app 2. Test in the ig Reviewed By: hanton Differential Revision: D32866326 fbshipit-source-id: ae2174f68dda4d2ab89ee328cb710c08d45c4d9a	2021-12-06 00:49:51 -08:00
Don Jang	c97dc9286d	Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h Test Plan: revert-hammer Differential Revision: D32780415 (`999e93e6a8`) Original commit changeset: 119b7aedbf56 fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8	2021-12-04 19:44:07 -08:00
Michael Suo	29a45f0009	Revert D32743881: [Core ML] Avoid recompiling models when the OS version is not changed Test Plan: revert-hammer Differential Revision: D32743881 (`b97903abb8`) Original commit changeset: 2e94c6035520 fbshipit-source-id: 6cb05c414a23e15604b095c333a92ed8980092bd	2021-12-04 15:57:58 -08:00
Don Jang	999e93e6a8	[Static Runtime] Move implementation details from impl.h into internal.h (#69274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69274 `impl.h` is the main header file that defines the interface of Static Runtime to its clients. However, it is currently filled with implementation details that should not be leaked to our clients. 1) this can unnecessarily leak our internals to our clients which can make it hard to change them later 2) cause unnecessary merge conflicts when multiple people are touching this enormous impl.cpp file. To alleviate the situation, this change moves the implementation details from impl.h into a new file, internal.h, that's internally kept without leaking the details to our clients. This change will be followed by another change to rename `impl.h` into `runtime.h` or anything better since `impl.h` is currently not about implementation but SR's interface. Note that this change is NOT complete since the remaining declarations in impl.h still contain a lot of implementation details. Therefore, we should keep working on minimizing the interface to prevent our API from being bloated unnecessarily. Also we need to work on modularizing our implementations into separate pieces organized by separate files in the near future. Test Plan: Existing unittests Reviewed By: donaldong Differential Revision: D32780415 fbshipit-source-id: 119b7aedbf563b195641c5674572a9348732145f	2021-12-04 14:48:28 -08:00
Tao Xu	b97903abb8	[Core ML] Avoid recompiling models when the OS version is not changed (#69234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69234 We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model. {F683788183} ghstack-source-id: 144784720 Test Plan: 1. Test in the playground app 2. Test in the ig Reviewed By: hanton Differential Revision: D32743881 fbshipit-source-id: 2e94c6035520de3eeaf0b61f7cf9082228c8a955	2021-12-04 13:38:27 -08:00
Bin Bao	e8f4c9cc40	[LT] Upstream LazyView and view ops IR Nodes (#69277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277 LazyView is the main class for tracking alias caused by view ops. The corresponding IR classes for view ops are hand-written now, and we can switch to code-gen them in future. For certain view ops, they have a reverse IR class to perform inplace update in the backward direction on a chain of alias ops. As part of the future work, we will simplify the logic for LazyView once the functionalization pass in core is ready to use. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32820014 Pulled By: desertfire fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256	2021-12-04 08:44:54 -08:00
Bin Bao	0bbe21b172	[LT] Upstream more util functions (#69098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69098 Add the following utils: helpers, ir_dump_util, and tensor_util. Some of the util functions may be better organized by grouping into different files, but we can leave that for later. Test Plan: Imported from OSS Reviewed By: alanwaketan Differential Revision: D32758480 Pulled By: desertfire fbshipit-source-id: 2a0707879f0c49573380b4c8227a3c916c99bf9a	2021-12-04 08:42:35 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
Don Jang	9663e08674	[Static Runtime] Fix a bug that aten::embedding_bag keeps cannot handle resized input tensors (#69219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69219 This change fixes a bug that `aten::embedding_bag` implementation does not adjust the size of a managed output tensor according to a given input after memory planning starts. Test Plan: Enhanced `StaticRuntime.EmbeddingBag` to trigger the existing bug that's fixed by this change. Reviewed By: mikeiovine Differential Revision: D32544399 fbshipit-source-id: 0a9f1d453e96f0cfa8443c8d0b28bbc520e38b29	2021-12-03 19:01:45 -08:00
Saketh Are	6a4fa86026	Add OpInfos for misc nn.functional operators (#68922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68922 Reviewed By: Chillee Differential Revision: D32842301 Pulled By: saketh-are fbshipit-source-id: b7166faefb64668fc76cca6c528501b0d360c43b	2021-12-03 17:03:02 -08:00
Michael Carilli	da023611d7	[CUDA graphs] Fixes make_graphed_callables example typos (#69379 ) Summary: cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379 Reviewed By: mruberry Differential Revision: D32841260 Pulled By: ngimel fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63	2021-12-03 16:51:14 -08:00
Xu Zhao	e92b14bf1f	Update CUDA version to 11.3 and setup proper environment variables. (#69383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69383 Test Plan: TorchBench CI RUN_TORCHBENCH: hf_Bert Reviewed By: janeyx99 Differential Revision: D32845001 Pulled By: xuzhao9 fbshipit-source-id: 50dff742ad4786e4b4995bd9aa82629b2fc050c5	2021-12-03 16:12:29 -08:00
Scott Wolchok	a3ca4c83a6	[PyTorch] Add torch::jit::toString(const Type&) (#66689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66689 Let's not take an extra refcount bump to stringify types. ghstack-source-id: 144374720 Test Plan: CI Reviewed By: suo Differential Revision: D31691526 fbshipit-source-id: 673d632a83e6179c063530fdbc346c22d5f47d7c	2021-12-03 15:16:08 -08:00
Will Constable	855365e9c4	Clean up dead code (#69296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69296 remove a commented block of code that was accidentally checked in Test Plan: no testable changes Reviewed By: alanwaketan Differential Revision: D32799197 fbshipit-source-id: d3eb05cbafb0f5a4a3f41c17f66ca6d0c2fc60b7	2021-12-03 15:11:38 -08:00
Nelson Elhage	a813ddf5ec	CUDACachingAllocator: make an error message more accurate. (#69174 ) Summary: The `TORCH_CHECK` asserts for strictly-greater-than `kLargeBuffer`, but the exception claims `>=`. Fix the error message to match the code. Happy to open an issue if it's helpful; I was hopeful the trivial fix doesn't need a separate issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69174 Reviewed By: zou3519 Differential Revision: D32760055 Pulled By: H-Huang fbshipit-source-id: 1a8ab68f36b326ed62d78afdcb198f4d6572d017	2021-12-03 15:04:59 -08:00
Elio	088a4feb41	Update the documentation for AMP with DataParallel (#69218 ) Summary: Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218 Reviewed By: gchanan Differential Revision: D32803814 Pulled By: ngimel fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367	2021-12-03 14:58:47 -08:00
Jane Xu	80a67cd33c	Limit uploading JSONs to trunk (#69385 ) Summary: Mac workflows on forked PRs don't have the right permissions to upload artifacts :/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/69385 Reviewed By: malfet, atalman Differential Revision: D32843252 Pulled By: janeyx99 fbshipit-source-id: e137a6707fe46559771b9d77fbfe44b0a21c914a	2021-12-03 13:20:37 -08:00
Nadav Elyahu	a20b9f8d5c	add HPU case for backend_to_string function (#69225 ) Summary: Change-Id: If8ed7f1161343a2e494d8b964576e1ee193007f7 Fixes https://github.com/pytorch/pytorch/issues/65609 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69225 Reviewed By: gchanan Differential Revision: D32804545 Pulled By: wconstab fbshipit-source-id: bdf359bd779113153ebdecc515edba94e47e0ae4	2021-12-03 12:54:15 -08:00
Donald Dong	6f7a5ddffc	[SR] Use std::vector::reserve in GetLivenessMap (#68884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68884 This diff uses std::vector::reserve in GetLivenessMap to set container capacity for all local contains to avoid runtime resizing. The changes should theoretically improves the performance by a little. Test Plan: - [x] `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1` - [x] ``` seq 1 10 \| xargs -I{} ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/data/users/dxd/302008423_0.predictor.disagg.local \ --method_name=local_request_only.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=0 --warmup_iters=0 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \ --input_type="recordio" --pt_inputs=/data/users/dxd/302008423_0.local_ro.inputs.recordio \ --recordio_use_ivalue_format=1 ``` ### Before ``` I1201 12:04:46.753311 2874563 PyTorchPredictorBenchLib.cpp:336] Took 10.9826 sec to initialize a predictor. I1201 12:05:00.617139 2875780 PyTorchPredictorBenchLib.cpp:336] Took 11.1078 sec to initialize a predictor. I1201 12:05:15.279667 2876813 PyTorchPredictorBenchLib.cpp:336] Took 11.7979 sec to initialize a predictor. I1201 12:05:30.201207 2877554 PyTorchPredictorBenchLib.cpp:336] Took 11.8901 sec to initialize a predictor. I1201 12:05:44.386926 2879713 PyTorchPredictorBenchLib.cpp:336] Took 11.2722 sec to initialize a predictor. I1201 12:05:58.003582 2881426 PyTorchPredictorBenchLib.cpp:336] Took 10.8046 sec to initialize a predictor. I1201 12:06:12.004778 2882604 PyTorchPredictorBenchLib.cpp:336] Took 11.2754 sec to initialize a predictor. I1201 12:06:26.101241 2884888 PyTorchPredictorBenchLib.cpp:336] Took 11.3355 sec to initialize a predictor. I1201 12:06:40.364817 2886572 PyTorchPredictorBenchLib.cpp:336] Took 11.401 sec to initialize a predictor. I1201 12:06:54.483794 2888614 PyTorchPredictorBenchLib.cpp:336] Took 11.3498 sec to initialize a predictor. ``` ### After ``` I1201 11:51:53.775239 2818391 PyTorchPredictorBenchLib.cpp:336] Took 10.9113 sec to initialize a predictor. I1201 11:52:07.412720 2819530 PyTorchPredictorBenchLib.cpp:336] Took 10.8413 sec to initialize a predictor. I1201 11:52:21.202816 2820359 PyTorchPredictorBenchLib.cpp:336] Took 11.0216 sec to initialize a predictor. I1201 11:52:35.513288 2821029 PyTorchPredictorBenchLib.cpp:336] Took 11.4216 sec to initialize a predictor. I1201 11:52:49.145979 2821930 PyTorchPredictorBenchLib.cpp:336] Took 10.8272 sec to initialize a predictor. I1201 11:53:02.908790 2822859 PyTorchPredictorBenchLib.cpp:336] Took 11.0262 sec to initialize a predictor. I1201 11:53:16.276015 2823657 PyTorchPredictorBenchLib.cpp:336] Took 10.6893 sec to initialize a predictor. I1201 11:53:30.103283 2824382 PyTorchPredictorBenchLib.cpp:336] Took 11.1854 sec to initialize a predictor. I1201 11:53:44.298514 2825365 PyTorchPredictorBenchLib.cpp:336] Took 11.4796 sec to initialize a predictor. I1201 11:53:58.258708 2826128 PyTorchPredictorBenchLib.cpp:336] Took 11.2652 sec to initialize a predictor. ``` Reviewed By: swolchok Differential Revision: D32649252 fbshipit-source-id: 5cd296d12b12e5b15e85e4f1a8a236e293f37f9c	2021-12-03 12:18:06 -08:00
Onyiee	ae11264583	Fixed type checking errors in node.py (#68124 ) Summary: Fixes [issue#67](https://github.com/MLH-Fellowship/pyre-check/issues/67) This PR fixes the type checking errors in Pytorch torch/fx/node.py . The variable types in 363:20 and 364:20 were declared to have type `List[str]` but were assigned a value of `None`. This caused an incompatitble variable type error. I changed the type from `List[str]` to `Optional[List[str]` . This therefore fixed the incompatitble variable type error. Signed-off-by: Onyemowo Agbo onionymous 0xedward Pull Request resolved: https://github.com/pytorch/pytorch/pull/68124 Reviewed By: gmagogsfm Differential Revision: D32322414 Pulled By: onionymous fbshipit-source-id: be11bbbd463715ddf28a5ba78fb4adbf62878c80	2021-12-03 12:03:49 -08:00
Kevin Tse	6baaec30cd	[DataPipe] Adding ShufflerMapDataPipe (#68606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68606 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32813290 Pulled By: NivekT fbshipit-source-id: 8d1ebd5bc776563c23250f76a2efc1d395f1af9c	2021-12-03 11:36:33 -08:00
Scott Wolchok	3e45739543	[PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326 Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local. ghstack-source-id: 144641680 Test Plan: {F684148304} CI Reviewed By: zhxchen17 Differential Revision: D32812841 fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0	2021-12-03 11:09:05 -08:00
Scott Wolchok	2c84b010e6	[PyTorch] Use toObjectRef in JIT interpreter (#69324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324 This slightly shrinks runImpl. Before: - Move pointer out of IValue - Clear the IValue to none - Do our thing with the Object - destroy the intrusive_ptr on the C stack - destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway) After: - Grab the pointer out of IValue - Do our thing with the Object - Decref the pointer in the IValue on the JIT stack as we assign over it We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well. ghstack-source-id: 144638920 Test Plan: Inspected assembly to verify shorter runImpl Tried to microbenchmark (D32809454) but can't show a difference. Reviewed By: gchanan Differential Revision: D32812252 fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5	2021-12-03 11:07:16 -08:00
Eli Uriegas	5a480831e6	.github: Propagate WITH_PUSH to docs jobs (#69372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69372 Docs weren't getting push since this variable wasn't getting propagated to the docker container Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32837012 Pulled By: seemethere fbshipit-source-id: 5074d5266a567df2230981186cabffb53c01c634	2021-12-03 11:00:38 -08:00
Matt Galloway	8f8524a447	Expose is_metal_available in header (#68942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68942 Currently, `at::native::is_metal_available()` is implemented, but it's not exposed in the header, so nobody can use it. It's a useful function and I want to use it, so exposing it in the header. Test Plan: CI Reviewed By: sodastsai, xta0 Differential Revision: D32675236 fbshipit-source-id: b4e692db7d171dfb872d5c2233cc808d7131f2e9	2021-12-03 10:31:03 -08:00
Joel Schlosser	77ca153d3e	Remove columns and ones from slow2d transpose signatures (#68898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68898 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655873 Pulled By: jbschlosser fbshipit-source-id: 810035a745e3851bd5326459b563e4796a074a65	2021-12-03 09:56:18 -08:00
Joel Schlosser	7ca2da14e9	Remove finput and fgrad_input from slow3d signatures (#68897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68897 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655875 Pulled By: jbschlosser fbshipit-source-id: 8d04968b2df47e11da1eceb1612d55d00768eeb4	2021-12-03 09:55:02 -08:00
Eli Uriegas	73d2ca20e0	.github: Add credentials for macos test jobs (#69371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69371 macOS jobs need credentials to upload their test stats Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32836893 Pulled By: seemethere fbshipit-source-id: 0f5a8f1b35f4240d57b08a2120a97a13ba3b3de5	2021-12-03 09:43:41 -08:00
Mike Iovine	6ed7354435	[SR][Code cleanup] Typedef/default for kwargs (#69164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69164 We have lots of methods that take `std::unordered_map<std::string, c10::IValue>` now. That's kind of ugly and cumbersome to type, so add a `KWargs` typedef. Also made the `operator()` default `kwargs` to empty. Note that we could have another overload that doesn't take `kwargs` at all, but the perf gain is so minuscule it's probably not worth it. ghstack-source-id: 144691899 Test Plan: CI Reviewed By: d1jang Differential Revision: D32734677 fbshipit-source-id: 8d6496a6d1ec2dc71253151d2f6408f1387966cf	2021-12-03 09:27:37 -08:00
Jane Xu	b761172406	Revert D32786909: [C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices Test Plan: revert-hammer Differential Revision: D32786909 (`dbc8d9c947`) Original commit changeset: a53f96f57e67 fbshipit-source-id: 19599c3a489804bfdbb3062f4256dceb680c143b	2021-12-03 08:31:45 -08:00
Andrey Talman	e0fb228e03	Revert of adding windows CUDA 11.5 workflow (#69365 ) Summary: This is partial revert of `bb522c9d7a` to revert addition of workflows for CUDA 11.5 windows that fails Pull Request resolved: https://github.com/pytorch/pytorch/pull/69365 Reviewed By: suo Differential Revision: D32831418 Pulled By: atalman fbshipit-source-id: 184346d22623f88594312a4ce2e4d29cc67e8338	2021-12-03 08:00:16 -08:00
Peter Bell	21919be96b	CMake: Update precompiled header and fix support (#67851 ) Summary: This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled. I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851 Reviewed By: zou3519 Differential Revision: D32290902 Pulled By: dagitses fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00	2021-12-03 06:51:56 -08:00
Mike Iovine	cc46dc45e1	[SR] Factor logic that determines managed tensors out of MemoryPlanner (#68295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68295 There's no reason we can't figure out what tensors we need to manage at model load time. It's also useful to have the set of ranges available at load time for integrating the ranges algorithm introduced in the previous diff. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D32400593 fbshipit-source-id: 0466b2641166ddc9c14f72774f4ba151407be400	2021-12-03 04:45:27 -08:00
Jacob Szwejbka	276cb8f501	[Pytorch Edge] Make Tracer support xirp metal segmentation (#69328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69328 Aten_metal_prepack is cpp based and can be safely included here. Test Plan: "Traced" the xirp model with the script. Reviewed By: xta0 Differential Revision: D32813686 fbshipit-source-id: 7a428151348dc9d3f576531701926d6b3413de3d	2021-12-02 22:16:19 -08:00
Saketh Are	a07ffe8d0e	Add OpInfos for combinations, cartesian_prod, sum_to_size, ldexp, as_strided (#68853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68853 Reviewed By: davidberard98 Differential Revision: D32811147 Pulled By: saketh-are fbshipit-source-id: 941dcf949072f8d10faf4d5a0fa0ef409ac6a7db	2021-12-02 21:22:56 -08:00
Mark Richardson	834bd3134e	Back out "Add efficient zero tensors" (#69327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327 Original commit changeset: d44096d88265 Original Phabricator Diff: D32144240 (`668574af4a`) Test Plan: CI original diff failed 175 builds in CI Reviewed By: airboyang, anjali411 Differential Revision: D32809407 fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071	2021-12-02 19:11:41 -08:00
ankitaS11	c572a603a6	fix for python 3.10 for gradient opinfo (#68113 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/67612 by creating a tensor first and then converting the dtype explicitly using `.to(dtype)` call. Looking forward to your feedback and suggestions on this. cc: kshitij12345 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68113 Reviewed By: zou3519 Differential Revision: D32797329 Pulled By: saketh-are fbshipit-source-id: 5c34709ab277c82cda316a3ea1cf01e853e4c38b	2021-12-02 19:01:01 -08:00
Michael Carilli	572c3e3118	Fix some usages of CUDA_VERSION (#69092 ) Summary: See https://pytorch.slack.com/archives/G4Z791LL8/p1638229956006300 I grepped c10, aten, and torch for CUDA_VERSION and checked the usages I saw. I can't guarantee I made a clean sweep. but this improves the status quo. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/69092 Reviewed By: zou3519 Differential Revision: D32786919 Pulled By: ngimel fbshipit-source-id: 1d29827dca246f33118d81e136252ddb5bf3830f	2021-12-02 18:32:47 -08:00
Andrew Tulloch	dbc8d9c947	[C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices (#69298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69298 I was exploring adding an invariant that we actually use properly-tracked pinned memory when doing non-blocking copies (to plug various correctness holes), and found this case where we allocate a tensor without pinned memory and then copy it with non_blocking=True. Test Plan: Unit tests cover this code. Reviewed By: rohan-varma Differential Revision: D32786909 fbshipit-source-id: a53f96f57e6727238e4cd2164c1a0f04cf270413	2021-12-02 17:34:34 -08:00
Shirong Wu	e2c7bd08b9	Add operator div (#68528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68528 Add operator converter div, torch.floor_div is announce to be deprecated by pytorch, consider remove after full deprecation done by pytorch. Reviewed By: 842974287 Differential Revision: D32497573 fbshipit-source-id: d06c864077f745c295c33fb25639b7116f85ca20	2021-12-02 17:25:40 -08:00
Nikita Shulga	bede18b061	Add support for C++ frontend wrapper on Linux (#69094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69094 Partially addresses https://github.com/pytorch/pytorch/issues/68768 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32730079 Pulled By: malfet fbshipit-source-id: 854e4215ff66e087bdf354fed7a17e87f2649c87	2021-12-02 16:47:00 -08:00
Peter Bell	33c3c539b6	THPStorage: Prefer intrusive_ptr over owning raw pointers (#69248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69248 Reviewed By: zou3519 Differential Revision: D32771035 Pulled By: ngimel fbshipit-source-id: cf9bbcc5563ae9715ecf13631ba56c32240e59e3	2021-12-02 16:33:03 -08:00
kshitij12345	9f39a2de0a	[fix] add range check for `k` kthvalue_cpu (#68863 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68813 Long-term it might make more sense to port it to structured Pull Request resolved: https://github.com/pytorch/pytorch/pull/68863 Reviewed By: H-Huang Differential Revision: D32749372 Pulled By: mruberry fbshipit-source-id: 85a1b2a31e922ff1df0d0f3f576ad219e652aa49	2021-12-02 15:33:06 -08:00
Eli Uriegas	cc85b68984	.github: Fix ci workflows generation (#69329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69329 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32814709 Pulled By: seemethere fbshipit-source-id: ea83aa0319bebb65623856ca9e34689581dd517b	2021-12-02 15:28:59 -08:00
Eli Uriegas	f786b03f98	ci: Migrate docs push to GHA (#69172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69172 Migrates the docs push jobs to Github Actions by implementing a simple WITH_PUSH switch to do the actual push. Adds 2 new workflows for GHA: * linux-docs (on trunk) * linux-docs-push (on schedule) linux-docs-push is the only workflow that actually gets access to credentials so it should be relatively safe. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32767239 Pulled By: seemethere fbshipit-source-id: 5b100f986cf4023c323f4f96f0fe7942fec49ad2	2021-12-02 15:06:57 -08:00
jjsjann123	db5425bcd2	re-enable layer_norm in autodiff (#69007 ) Summary: Turn on layer_norm in autodiff https://github.com/pytorch/pytorch/issues/67732 should have fixed the previously issue exposed by enabling layer_norm in autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69007 Reviewed By: soulitzer Differential Revision: D32699108 Pulled By: eellison fbshipit-source-id: 6951668c0e74e056d3776294f4e1fd3123c763e5	2021-12-02 14:55:00 -08:00
kshitij12345	5b2586fe09	[testing] Ignore expected_regex in assertRaisesRegex for non-native device (#68723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29719 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723 Reviewed By: zou3519 Differential Revision: D32797061 Pulled By: mruberry fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f	2021-12-02 14:52:27 -08:00
Joel Schlosser	36ba1b6b3a	Remove unused _convolution_nogroup op (#68829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68829 Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D32627578 Pulled By: jbschlosser fbshipit-source-id: 8a4c0ac58aae184a465b1fd40cce880a60d67339	2021-12-02 14:42:08 -08:00
Mikhail Zolotukhin	791d5087ed	[TensorExpr] Add lowerings for quantized ops: cat, mul, conv1d, relu. (#69055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69055 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32710325 Pulled By: ZolotukhinM fbshipit-source-id: 4a7f0ac059ea238463317b6a45a822b8d05610dd	2021-12-02 14:34:21 -08:00
Mikhail Zolotukhin	83c4451f60	[TensorExpr] Add a pass to symbolize an input dimension. (#68857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68857 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632908 Pulled By: ZolotukhinM fbshipit-source-id: bcee95d83731fcea07ec2f55ed78418ee52f51b6	2021-12-02 14:34:18 -08:00
Mikhail Zolotukhin	1e9dcdd2a0	[TensorExpr] TensorExprKernel: support custom-class constants. (#68856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68856 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632907 Pulled By: ZolotukhinM fbshipit-source-id: e4180f8d791ba0cdf82bcb3bd11b61405c2faadd	2021-12-02 14:34:15 -08:00
Mikhail Zolotukhin	48d7d585c8	[TensorExpr] IR Eval: add more logging. (#68855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68855 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632905 Pulled By: ZolotukhinM fbshipit-source-id: fef9b019d8d5b8a3ffd4075bfac069d1c81f647d	2021-12-02 14:34:12 -08:00
Mikhail Zolotukhin	b6bcf5a0f1	[TensorExpr] Un-const TEK::kernel_func_name to allow recompilation. (#68854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68854 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632904 Pulled By: ZolotukhinM fbshipit-source-id: 154e3802ba844e738f09dbc239cf3656b9f8d5fd	2021-12-02 14:33:02 -08:00
Nikita Shulga	a0367f8980	Revert D32404517: [quant][embedding qat] Support Embedding QAT via FX API Test Plan: revert-hammer Differential Revision: D32404517 (`abda069ce2`) Original commit changeset: 0484df8c826b fbshipit-source-id: 4e7d62b9ccdb84eb4d184cd0b3c9506013fd8336	2021-12-02 14:28:35 -08:00
Nikita Shulga	ec4c749024	Revert D32318435: [quant][embdding qat] Add FX support for QAT EmbeddingBag Test Plan: revert-hammer Differential Revision: D32318435 (`4484c04513`) Original commit changeset: 8b5d1a5d5422 fbshipit-source-id: e46d431f92a5c3f86c757695164d1eb5b0041298	2021-12-02 14:27:17 -08:00
Jane Xu	8dafe6e147	Forward fix merge conflict (#69319 ) Summary: Forward fixes a merge conflict between two commits Pull Request resolved: https://github.com/pytorch/pytorch/pull/69319 Reviewed By: seemethere Differential Revision: D32810884 Pulled By: janeyx99 fbshipit-source-id: 6e2f9fc89d00da979de1430a172673e82c51ba14	2021-12-02 14:05:54 -08:00
Kurt Mohler	52219b1017	Fix `ChainedScheduler.get_last_lr()` (#69112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68820 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112 Reviewed By: zou3519 Differential Revision: D32796626 Pulled By: albanD fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa	2021-12-02 13:44:12 -08:00
Ming Zhu	db30696be8	[pytorch][PR] bug fix for D32374003 (#69278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69278 Test Plan: ``` fbpkg build -E smart.inference_platform_sp.sigrid_predictor.persistent.bolt --yes ``` Reviewed By: kimishpatel, HDCharles Differential Revision: D32773910 fbshipit-source-id: a2181fea354f310cf9f6f57b802dc4a148627acc	2021-12-02 13:31:19 -08:00
Jane Xu	915c26f588	GHA: preserve downloaded JSONs as artifacts (#69258 ) Summary: Preserves the .json files in the test folder for every test job as an artifact. Going to hud.pytorch.org/pr/69258 and downloading/unzipping any of the `test-jsons-*.zip` shows that .pytorch-slow-tests.json and .pytorch-disabled-tests.json exist. (Though you won't see them in your file manager as they are hidden files.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69258 Reviewed By: seemethere Differential Revision: D32807102 Pulled By: janeyx99 fbshipit-source-id: ed1b227cdd32160ed045dd79a7edc55216dcfe53	2021-12-02 13:26:14 -08:00
lezcano	cafcf599d0	Deprecate torch.triangular_solve (#63570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570 There is a use of `at::triangular_solve_out` in the file `torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared to move to `at::linalg_solve_triangular_out`. Deprecation note: This PR deprecates the `torch.triangular_solve` function in favor of `torch.linalg.solve_triangular`. An upgrade guide is added to the documentation for `torch.triangular_solve`. Note that it DOES NOT remove `torch.triangular_solve`, but `torch.triangular_solve` will be removed in a future PyTorch release. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618035 Pulled By: anjali411 fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83	2021-12-02 13:24:55 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	dde801686b	Expose MobileCode to python (#66592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66592 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D31632600 Pulled By: tugsbayasgalan fbshipit-source-id: 46a7ac20ddb6b433bd037280ed020481901a15c9	2021-12-02 13:18:46 -08:00
Andrey Talman	bb522c9d7a	Enabling CUDA 11.5 for binary builds, Adding windows workflows for CUDA 11.5 (#69262 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69262 Reviewed By: malfet Differential Revision: D32804850 Pulled By: atalman fbshipit-source-id: abac45ad1d49ec7e0e7df6cb9a22a46fbcd905a2	2021-12-02 13:04:43 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Anthony Shoumikhin	97750e03a4	[torch][edge] Add int to the copy kernel. (#69297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69297 . Test Plan: CI Reviewed By: JacobSzwejbka Differential Revision: D32799822 fbshipit-source-id: c40fdb55a706b3a8eccaa69dbfbc6d7af0b111e4	2021-12-02 12:13:58 -08:00
Eli Uriegas	7142b0b033	.github: Add linux.large to actionlint.yaml (#69304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69304 Don't know why this isn't automatically figured out Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: anjali411, atalman, janeyx99 Differential Revision: D32805380 Pulled By: seemethere fbshipit-source-id: 2c4805f87ae91388a6b605a6394024887b4bc71e	2021-12-02 11:21:49 -08:00
Denis Baručić	4056251a18	Add missing spaces to an error message (#69289 ) Summary: Before: `ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor.This is because InstanceNorm1d reshapes inputs to(1, N * C, ...) from (N, C,...) and this makesvariances 0.` After: `ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor. This is because InstanceNorm1d reshapes inputs to (1, N * C, ...) from (N, C,...) and this makes variances 0.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69289 Reviewed By: jbschlosser Differential Revision: D32796035 Pulled By: albanD fbshipit-source-id: c8e7c5cf6e961ec5f7242b31c7808454104cde02	2021-12-02 11:03:33 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	2ea70a6462	Aloow Union of scalars to be NumberType (#66591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66591 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D31632599 Pulled By: tugsbayasgalan fbshipit-source-id: 374065da1d91334a19c15c604faf13ebec1681f6	2021-12-02 10:52:02 -08:00
Eli Uriegas	d673b1ec59	.github: Switch ciflow-should-run to self hosted (#69166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69166 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32735493 Pulled By: seemethere fbshipit-source-id: 9a03cf5245d1dbfe1be86cfbb3f5d1d42dd391c8	2021-12-02 10:42:07 -08:00
Scott Wolchok	14ed4df305	[PyTorch][Static Runtime][easy] give to_copy_functor a name (#67701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67701 I split this out to ease rebasing and review. ghstack-source-id: 144507288 Test Plan: CI Reviewed By: hlu1 Differential Revision: D32112523 fbshipit-source-id: dba14e6ada33df02dbcd7025b090a8a18cf438ae	2021-12-02 10:36:26 -08:00
Scott Wolchok	21686923e8	[PyTorch][SR] More debug logging (#67220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67220 Specifically we log AliasDb and same_storage_values, and are chattier about the aliasing logs in the liveness analysis. ghstack-source-id: 144507289 Test Plan: Used to help develop D31776259 Reviewed By: hlu1 Differential Revision: D31847561 fbshipit-source-id: 8371455d060c17dace91cd90e4034b7618f820a6	2021-12-02 10:36:23 -08:00
Scott Wolchok	b22e4d4aea	[PyTorch][SR] Add more to() tests & extend debug logging in testStaticRuntime (#67219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67219 I found that these specific test cases were causing different failures when developing D31776259. I also found that it was difficult to debug testStaticRuntime failures, so I added more verbose logs gated behind -v 2. ghstack-source-id: 144507287 Test Plan: Used during development of D31776259 Reviewed By: hlu1 Differential Revision: D31847566 fbshipit-source-id: ea9147fb246c345d18bbc8d7f3bfba48d3a0fab3	2021-12-02 10:34:54 -08:00
Jerry Zhang	84aa1ddedd	[quant] Remove warning for quantized Tensor in `__dir__` (#69265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69265 This is used in tab completion, we should not put warning here Test Plan: ci Imported from OSS Reviewed By: albanD Differential Revision: D32778736 fbshipit-source-id: f1bec5e09a8238ab41329ac2b64e6f3267799f6a	2021-12-02 10:30:36 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Wanchao Liang	ff3fc37267	[BE] rewrite ProcessGroupNCCLTest to be MultiProcess (#67705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67705 This PR rewrites ProcessGroupNCCLTest to be MultiProcessTestCase. It was originally written in a single process multi-GPU fashion, we change it to multi-process instead to align with other c10d tests. ghstack-source-id: 144555092 Test Plan: wait for CI Reviewed By: pritamdamania87, fduwjj Differential Revision: D32113626 fbshipit-source-id: 613d36aeae36bf441de1c2c83aa4755f4d33df4d	2021-12-02 10:12:05 -08:00
Vasiliy Kuznetsov	5c816520b3	ns for fx: fix bug in graph matcher (#69238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69238 The NS for FX graph matcher was not properly taking into account seen_nodes, this allowed a node to be matched twice. Test Plan: FB-only testing on real model passes. Ideally we would have a test case to capture this, but hopefully we can land this soon to unblock production work. Imported from OSS Reviewed By: HDCharles Differential Revision: D32765761 fbshipit-source-id: ed3dff8fd981e399a649fcd406883b4d56cc712a	2021-12-02 09:59:57 -08:00
Richard Zou	698c35e743	Add functorch TLS to ATen/ThreadLocalState (#69181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69181 functorch lives out-of-tree. However, it has some TLS that needs to be propagated. The solution for that is we store a pointer to the TLS inside pytorch/pytorch and extend FuncTorchTLSBase inside functorch to include whatever functorch needs. A previous solution used ThreadLocalDebugInfo. However, all PyTorch-managed threads (e.g. spawned by Autograd) all receive a shared_ptr that points to the same ThreadLocalDebugInfo. This leads to race conditions if the multiple threads start modifying the TLS stored within ThreadLocalDebugInfo without using mutexes. Test Plan: - tested with functorch - The performance impact of this change when functorch is not used is negligible because we end up manipulating nullptrs. Reviewed By: albanD Differential Revision: D32742312 Pulled By: zou3519 fbshipit-source-id: 1a8439a4af06b3d3e50b9a2dbca98a0ba612062a	2021-12-02 09:29:55 -08:00
Brian Hirsh	0de7a618a3	functionalization: update is_aliased() logic (#68881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68881 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32647614 Pulled By: bdhirsh fbshipit-source-id: 6bec50d3e54419d1707d0b6c0c6729bcc1ced1f0	2021-12-02 09:19:12 -08:00
Ben Koopman	4484c04513	[quant][embdding qat] Add FX support for QAT EmbeddingBag (#68121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68121 Add FX support for QAT EmbeddingBag operator, previously only eager mode support. Test Plan: pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embeddingbag_linear" Imported from OSS Reviewed By: supriyar Differential Revision: D32318435 fbshipit-source-id: 8b5d1a5d5422972c49676f9e470d5fbe29dd503b	2021-12-02 09:05:07 -08:00
Nikita Shulga	78ab3cde4a	Do not modify type map from getCustomClassTypeImpl() (#69261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69261 As this function is supposed to be called only once per type from caching getCustomClassType template Test Plan: Imported from OSS Reviewed By: suo, lw Differential Revision: D32776564 Pulled By: malfet fbshipit-source-id: 218436657e6ad5ad0c87964857114d1e60c57140	2021-12-02 08:53:09 -08:00
Nikita Shulga	113684cf81	Fix crash in `checkCustomClassType` if arg is null (#69259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69259 Otherwise `checkCustomClassType(nullptr, new Type())` will crash Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D32775297 Pulled By: malfet fbshipit-source-id: 54b10fd395d734c615dcaf85a5e599a445cee663	2021-12-02 08:51:59 -08:00
anjali411	668574af4a	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32144240 Pulled By: anjali411 fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40	2021-12-02 08:47:45 -08:00
Ben Koopman	abda069ce2	[quant][embedding qat] Support Embedding QAT via FX API (#68296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68296 Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"` Imported from OSS Reviewed By: jingsh, supriyar Differential Revision: D32404517 fbshipit-source-id: 0484df8c826b823b60dfecd9def77bf8cffe0527	2021-12-02 08:42:45 -08:00
Ben Koopman	3157371bb4	[quant][embedding qat] Fix bug enforcing quant_min <= zero_point <= quant_max for float zeropoint (#68852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68852 When using a float zero_point in FakeQuant, such as for embeddings, it does not need to be between quant_min and quant_max, as is enforced for integer zero_points. This is because float zero_points are formulated as per: ``` xq = Round(Xf * inv_scale + zero_point), Xq = Round((Xf - min) * inv_scale) ``` Test Plan: pytest test/test_quantization.py -v -k "test_fake_quant_per_channel_qparam_range" Imported from OSS Reviewed By: supriyar Differential Revision: D32645014 fbshipit-source-id: 96dc3ca6eef9cee60be6919fceef95c9f2759891	2021-12-02 07:58:03 -08:00
Will Constable	397183f44c	Add Lazy Tensor codegen infra (#69020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69020 Merges the lazy tensor codegen infra which has already been used on lazy_tensor_staging. Test Plan: Test via lazy_tensor_staging branch Reviewed By: alanwaketan, bdhirsh Differential Revision: D32570613 fbshipit-source-id: 2cd5698644398bda69669683f8de79fd3b6639b5	2021-12-02 07:51:52 -08:00
Alban Desmaison	28c519961f	Follow the undefined Tensor <-> None rule better in torch dispatch (#67793 ) Summary: As per title. This in particular allows to more easily override backward function for which the underlying backend returns `None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67793 Reviewed By: zou3519 Differential Revision: D32242962 Pulled By: albanD fbshipit-source-id: 6e114def90ee9499161e1303d301ba7fd003ff89	2021-12-02 07:46:56 -08:00
Kevin Tse	0465f64bb8	[DataPipe] Adding BatcherMapDataPipe (#68197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68197 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32440963 Pulled By: NivekT fbshipit-source-id: 277cbe8d735afe341a7c189be20e1d334ecf9d4a	2021-12-02 07:27:17 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Hao Lu	ed3b73fd4d	[Static Runtime] Skip ProcessedNode:: verify_no_memory_overlap() for out variants (#68639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68639 Fix all problems related to `ProcessedNode:: verify_no_memory_overlap()` - Only enable this check for native and fallback ops that are not inplace or view ops - Enable ProcessedNode:: verify_no_memory_overlap() in debug mode and enforce it - Add gflag --static_runtime_disable_debug_memory_overlap_check to test the runtime memory overlap fix for bad schemas fb::expand_dims's schema was not correct after this check is re-enabled. It's fixed in D32556204 (`39ab417107`) Reviewed By: mikeiovine Differential Revision: D32553708 fbshipit-source-id: 88de63cdf1ee4f87b7726c8b65a11a5fb8a99d13	2021-12-02 05:03:12 -08:00
Wanchao Liang	c60232d89a	[shard] add back init_from_local_shard_and_global_metadata API (#69226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69226 This add back the previous init_from_local_shards API, but renamed it to init_from_local_shard_and_global_metadata. It's a partial revert of D32147888 (`35712a8eb4`). We now provide two APIs: 1. `init_from_local_shards`: user don't need to provide global metadata and we do all_gather under the hood, the other that 2. `init_from_local_shards_and_global_metadata`: user need to explicitly construct ShardedTensorMetadata to use this API, need to ensure correctness on all ranks, as there's no cross-rank communication/validations. All of these two APIs stay private until it stablizes and proof of UX. The second one can only be called on `ShardedTensor` class directly, not included as a package API for now. Test Plan: test_init_from_local_shards_and_global_metadata test_init_from_local_shards_and_global_metadata_invalid_shards Reviewed By: dstaay-fb, pritamdamania87 Differential Revision: D32746882 fbshipit-source-id: bafd26ce16c02e2095907f9e59984a5d775c7df5	2021-12-02 01:02:56 -08:00
Yanli Zhao	12621c3a39	support pure fp16 training in FSDP (#68417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68417 1. since parameter attributes are lazily initialized at the beginning of forward, it makes more sense to init full_param_padded using parameters' data type during lazy_init, instead of using parameters' data type during construction, as parameters' data type may be changed after construction and before training loop 2.add checking whether parameter storage is changed outside FSDP and handle it properly ghstack-source-id: 144479019 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32458643 fbshipit-source-id: 0e07e5e08270f2e265e8f49124a6648641e42e7a	2021-12-02 00:27:45 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Elias Ellison	40fb28ea87	[JIT] Compute input sym shapes in partial eval graph (#68281 ) Summary: Needed for NNC dynamic shape fusion. Previously, when creating a partially evaluated graph for symbolic shape compute, if the input wasn't used, we wouldn't compute it, which led to failures when NNC expected this value to be passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68281 Reviewed By: navahgar Differential Revision: D32401365 Pulled By: eellison fbshipit-source-id: 97a684e5f1faed5df77c8fd69f9623cdba0781f9	2021-12-01 22:33:35 -08:00
Kevin Tse	d8a44270d6	[DataPipe] Simplify BatcherIterDataPipe by removing 'unbatch_level' argument and functionality (#68594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68594 Based on my conversation with ejguan [here](https://github.com/pytorch/pytorch/pull/68197#pullrequestreview-809148827), we both believe that having the `unbatch_level` argument and functionality is making this DataPipe unnecessarily complicated, because users can call `.unbatch` before `.batch` if they would like to do so. That will likely be cleaner as well. I also checked other libraries (for example, [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#unbatch)), and I do not see them provide the ability the `unbatch` within the `batch` function either. This PR simplifies the DataPipe by removing the argument. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32532594 Pulled By: NivekT fbshipit-source-id: 7276ce76ba2a3f207c9dfa58803a48e320adefed	2021-12-01 22:00:31 -08:00
Michael Suo	ad182479b0	[deploy] docs (#69251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251 This adds some actual documentation for deploy, which is probably useful since we told everyone it was experimentally available so they will probably be looking at what the heck it is. It also wires up various compoenents of the OSS build to actually work when used from an external project. Differential Revision: D32783312 D32783312 Test Plan: Imported from OSS Reviewed By: wconstab Pulled By: suo fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8	2021-12-01 21:55:18 -08:00
Dennis van der Staay	cbe0a38d8c	Back out "[CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer" (#69193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69193 Reviewed By: xing-liu, yuchenhao Differential Revision: D32748570 fbshipit-source-id: bd73d7567f94c70daeace49d4081381b8adf2d77	2021-12-01 19:30:08 -08:00
Dennis van der Staay	929f2a750a	Back out "[CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability" (#69191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69191 Reviewed By: xing-liu, yuchenhao Differential Revision: D32748466 fbshipit-source-id: 6abd3265e8a20270305da3f8be25114ad4d67fc2	2021-12-01 19:28:57 -08:00
Pearu Peterson	370d0afc1b	Strided masked var. (#68738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68738 Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D32767155 Pulled By: cpuhrsch fbshipit-source-id: a5c095103405fbfc28b9f4fd624bdbbc45e7f715	2021-12-01 19:19:37 -08:00
Jacob Szwejbka	291e56eda4	[Pytorch Edge] Update Black Box Api with operator versioning (#68678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678 Test Plan: Ill update the unit test before land Reviewed By: cccclai Differential Revision: D32573603 fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d	2021-12-01 19:13:32 -08:00
Chen Lai	b9738e923e	[Operator Versioning][Edge] Add old models and unittest (#67726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726 1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader. ``` _helper(mobile_module_v2, div_tensor_0_3) _helper(current_mobile_module, torch.div) ``` 2. Update the commented code accordingly. Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders: ``` // Tensor x Tensor test_versioned_div_tensor_v3 // Tensor x Scalar test_versioned_div_scalar_float_v3 test_versioned_div_scalar_reciprocal_int_v3 test_versioned_div_scalar_inplace_float_v3 // Scalar x Scalar test_versioned_div_scalar_scalar_v3 // Tensor x Tensor with out kwarg test_versioned_div_tensor_out_v3 // Tensor x Tensor inplace test_versioned_div_tensor_inplace_v3 // Tensor x Scalar inplace test_versioned_div_scalar_inplace_int_v3 ``` Note: In this pr, per model, it includes the following test: 1. Model (with old op) load/run test will be in both cpp and python 2. Model (with old op) + upgrader test will be in python Other tests considered adding: 1. per upgrader bytecode test 2. app level integration test ghstack-source-id: 144422418 Test Plan: CI and the added unittest Reviewed By: iseeyuan Differential Revision: D32069653 fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9	2021-12-01 18:46:30 -08:00
Guo Yejun	124bb6a19d	RegisterDispatchKey.cpp: remove redundant code (#68983 ) Summary: remove the line since line 10 has already included this header file Pull Request resolved: https://github.com/pytorch/pytorch/pull/68983 Reviewed By: samdow Differential Revision: D32706952 Pulled By: soulitzer fbshipit-source-id: 98746e12d8d04d64ee2e0449e4aec5153ac723d5	2021-12-01 18:38:19 -08:00
Aliaksandr Ivanou	fced51eaf7	[torch][distributed] Check for file existence before invoking cleanup logic in FileStore destructor (#68603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68603 FileStore is frequently used from the python lang, which has GC. This means, that users of FileStore from python do not have control over when FileStore destructor is invoked. If the directory for file store is created by some external logic, that has cleanup procedure, this procedure may have a race condition with the logic in the FileStore destructor. The diff adds check for file access in destructor before actually invoking the cleanup. In long term, it makes sense to move out the cleanup logic out of the destructor to a separate method. Test Plan: CI Stress tests: `buck test mode/dev-nosan //torchrec/examples/dlrm/tests:test_dlrm_main -- --exact 'torchrec/examples/dlrm/tests:test_dlrm_main - torchrec.examples.dlrm.tests.test_dlrm_main.MainTest: test_main_function' --run-disabled --jobs 18 --stress-runs 20 --record-results` Reviewed By: colin2328 Differential Revision: D32535470 fbshipit-source-id: 6f421f2e7b0d9ac9c884a1db2f7e5a94fc59fc0e	2021-12-01 16:43:42 -08:00
jjsjann123	3c1e2ff9eb	fixing layer_norm cuda bug (#69210 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69210 Reviewed By: H-Huang Differential Revision: D32764811 Pulled By: ngimel fbshipit-source-id: fb4201fe5f2284fcb22e36bc1029eef4a21b09bf	2021-12-01 15:46:47 -08:00
Ansha Yu	d72d476875	[pyper] add flag to disable clip_ranges_gather fusions (#69198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69198 add flag --enable_clip_ranges_gather_fusions to disable clip_ranges+gather_ranges fusions. This fusion happens in static runtime, and it also happens in jit when optimize_sparse_nn_model is used. Note that clip_ranges+gather_ranges+sigrid_hash fusions use different code that was untouched by D30515441 (`01b30922dd`), so not disabling it for now. This also effectively disables ClipRangesGatherSigridHash(graph) (even though it's not explicitly included), because that fusion lookgs for the clip_ranges_gather_lengths_to_offsets fusion, which won't exist if this flag is on Test Plan: Run ptvsc2_predictor_bench with --enable_clip_ranges_gather_fusions=0 and SR=1 ``` Input size: 211 Static runtime ms per iter: 11.9668. Iters per second: 83.5643 Time per node type: 6.42796 ms. 54.5663%. static_runtime::fused_variadic_sigrid_transforms_torch_bind (1 nodes, out variant) 1.64969 ms. 14.0041%. fb::quantized_linear (9 nodes, out variant) 0.475394 ms. 4.03557%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (158 nodes, out variant) 0.367554 ms. 3.12013%. aten::argmin (1 nodes, out variant) 0.358351 ms. 3.04201%. aten::matmul (1 nodes, out variant) 0.215082 ms. 1.82581%. static_runtime::to_copy (805 nodes, out variant) 0.214397 ms. 1.81999%. fb::gather_ranges (313 nodes, out variant) 0.179759 ms. 1.52595%. fb::offsets_to_ranges (655 nodes, out variant) 0.173236 ms. 1.47058%. fb::lengths_to_offsets (464 nodes, out variant) 0.151249 ms. 1.28394%. aten::sub (1 nodes, out variant) 0.14017 ms. 1.18989%. aten::sigmoid (3 nodes, out variant) 0.136118 ms. 1.15549%. aten::mul (5 nodes, out variant) 0.130813 ms. 1.11046%. aten::sum (3 nodes, out variant) 0.124876 ms. 1.06006%. aten::repeat (1 nodes, out variant) 0.12191 ms. 1.03488%. static_runtime::signed_log1p (1 nodes, out variant) 0.0922349 ms. 0.782972%. aten::norm (1 nodes, out variant) 0.0877845 ms. 0.745193%. aten::pow (1 nodes, out variant) 0.0783335 ms. 0.664966%. fb::batch_box_cox (1 nodes, out variant) 0.0755047 ms. 0.640951%. fb::clip_ranges (311 nodes, out variant) 0.0702456 ms. 0.596308%. static_runtime::layer_norm (2 nodes, out variant) 0.0696762 ms. 0.591475%. fb::quantize_per_tensor (4 nodes) 0.0556873 ms. 0.472724%. quantized::embedding_bag_byte_prepack (3 nodes, out variant) 0.0555237 ms. 0.471335%. prim::VarConcat (2 nodes, out variant) 0.0437336 ms. 0.37125%. static_runtime::dict_unpack (2 nodes, native) 0.0390592 ms. 0.33157%. static_runtime::dequantize_copy (9 nodes, out variant) 0.0385823 ms. 0.327521%. fb::concat_add_mul_replacenan_clip (1 nodes, out variant) 0.0321869 ms. 0.273231%. prim::TupleConstruct (1 nodes, out variant) 0.0308289 ms. 0.261703%. fb::casted_batch_one_hot_lengths (1 nodes, out variant) 0.0280272 ms. 0.23792%. static_runtime::reshape_copy (2 nodes, out variant) 0.0244705 ms. 0.207727%. fb::sigrid_hash_precompute (1 nodes, out variant) 0.020917 ms. 0.177562%. static_runtime::VarTupleUnpack (1 nodes, native) 0.0175842 ms. 0.149271%. aten::div (1 nodes, out variant) 0.0169989 ms. 0.144302%. aten::narrow_copy (4 nodes, out variant) 0.00818147 ms. 0.0694517%. aten::logit (1 nodes, out variant) 0.00719822 ms. 0.061105%. prim::VarStack (1 nodes, out variant) 0.00687292 ms. 0.0583435%. aten::add (1 nodes, out variant) 0.00328646 ms. 0.0278985%. aten::clamp_min (1 nodes, out variant) 0.00325073 ms. 0.0275951%. static_runtime::expand_dims_copy (1 nodes, out variant) 0.00295617 ms. 0.0250946%. static_runtime::flatten_copy (1 nodes, out variant) 0.00230511 ms. 0.0195679%. aten::expand_as (1 nodes, native) 0.00182061 ms. 0.015455%. aten::full_like (1 nodes, out variant) 0.000268152 ms. 0.00227631%. prim::ListConstruct (1 nodes, out variant) 11.7801 ms. in Total ``` Servicelabs: AF: https://www.internalfb.com/intern/servicelab/1001770528/ AI: https://www.internalfb.com/intern/servicelab/402342245/ Prospector: https://www.internalfb.com/intern/servicelab/502342630/ Reviewed By: movefast1990 Differential Revision: D32750847 fbshipit-source-id: b809a72a9fbeea86080346962eb17761e71397d8	2021-12-01 15:26:36 -08:00
Santiago Castro	263125a962	Fix RAdam docstring on LR default value (#69186 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186 Reviewed By: albanD Differential Revision: D32759614 Pulled By: H-Huang fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67	2021-12-01 14:32:07 -08:00
Kyle Matoba	3bf4080fd9	Change misleading MaxUnpool2d example to better demonstrate output_size usage (#68936 ) Summary: At https://github.com/pytorch/pytorch/issues/68873, jbschlosser states that maxunpool2d with the `output_size` argument only works for indices of the same size. This makes sense, but unfortunately it's not what's shown in the example! I've removed the wrong example and replaced it with one where specifying `output_size` is actually necessary -- the unpool call fails without it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68936 Reviewed By: H-Huang Differential Revision: D32759207 Pulled By: jbschlosser fbshipit-source-id: 658e1724150a95454a05a771ae7c6e2e736740a7	2021-12-01 14:11:26 -08:00
peterjc123	2eef5e76db	add `extra_repr` for `nn.ZeroPad2d` (#69206 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69206 Reviewed By: H-Huang Differential Revision: D32759597 Pulled By: jbschlosser fbshipit-source-id: abc9ee69fb5e22d45a640993a4e598b016020688	2021-12-01 13:53:19 -08:00
David Berard	cd043c335f	Revert D32329330: [JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp Test Plan: revert-hammer Differential Revision: D32329330 (`cfc75c2137`) Original commit changeset: c0f10da4b954 fbshipit-source-id: e81f93a5c1e2bb9b20fde6ccaeef143472a5b900	2021-12-01 12:55:10 -08:00
Jiewen Tan	e6c435bf96	[LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064 This commit upstreams helpers for converting a c10::Device to BackendDevice and vice versa. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten Reviewed By: wconstab Differential Revision: D32732607 Pulled By: alanwaketan fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b	2021-12-01 12:15:32 -08:00
Yanli Zhao	92f168941e	remove accidentally committed redundant debug print (#68510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68510 remove accidentally committed redundant debug print ghstack-source-id: 144362817 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32487736 fbshipit-source-id: 279030f782e6b716a6bbfd591c5ce761de3ddd63	2021-12-01 11:35:34 -08:00
Pearu Peterson	1842364b30	Strided masked normalize. (#68694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68694 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32724552 Pulled By: cpuhrsch fbshipit-source-id: 82f579a86b0b265e0b9b3715a8a327b775dd55e1	2021-12-01 10:45:16 -08:00
Mike Guo	23633bdb5c	record the datapipe for each pieces of Dataset (#67613 ) Summary: Add record_function for each DataPipe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67613 Reviewed By: H-Huang Differential Revision: D32246672 Pulled By: ejguan fbshipit-source-id: 02ef7e75748c5b84fdcbb103398532e1f2962fbf	2021-12-01 10:29:06 -08:00
Paul Streli	deaf745aee	Add kl divergence between normal and laplace distribution. (#68807 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/68746] ![KL_normal_laplace](https://user-images.githubusercontent.com/35850237/143008244-f304cee1-9583-4de1-b0d0-5751ebdb8188.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/68807 Reviewed By: H-Huang Differential Revision: D32750391 Pulled By: neerajprad fbshipit-source-id: 129e6ef60d6e244d0d6b02b3944bfd5d8b06edcb	2021-12-01 10:22:08 -08:00
Tim Poulsen	486ae5c733	Dataset & IterableDataset attribute errors prints attribute (#69021 ) Summary: The message is the message from a standard attribute error. Thought it would be informative when the error is thrown. Alternatively in python 3.10, one can set the keyword arguments 'name' and 'obj', reference: https://github.com/python/cpython/blob/3.10/Doc/library/exceptions.rst#concrete-exceptions Fixes #{?} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69021 Reviewed By: samdow Differential Revision: D32730362 Pulled By: ejguan fbshipit-source-id: 7132ba612fa6075aeffb9315ce651828e9a8e0bc	2021-12-01 10:16:31 -08:00
Kurt Mohler	d507fd63f3	Check that block height and width are positive in `nn.Fold` (#69048 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68875 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69048 Reviewed By: samdow Differential Revision: D32729307 Pulled By: jbschlosser fbshipit-source-id: 162cafb005873012d900d86997d07640967038c0	2021-12-01 10:08:47 -08:00
Nikita Shulga	c08e95dd9c	Introduce `IS_LINUX` and `IS_MACOS` global vars (#69093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69093 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32730080 Pulled By: malfet fbshipit-source-id: aa3f218d09814b4edd96b01c7b57b85fd58c47fc	2021-12-01 09:47:38 -08:00
Nikita Shulga	840fe8e4e6	Fix MacOS artifact upload (#69188 ) Summary: Add test shard number and runner name to the test name suffix Otherwise test report names for shard 1 and shard 2 will be identical and override each other Pull Request resolved: https://github.com/pytorch/pytorch/pull/69188 Reviewed By: janeyx99 Differential Revision: D32747747 Pulled By: malfet fbshipit-source-id: 149f921d8e420d3ed69ce812bdcd3c034799353a	2021-12-01 08:06:48 -08:00
lezcano	f9e69af22e	Modify LU_backward and lu_solve_backward to use linalg_solve_triangular (#63569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569 This PR also rewrites `lu_solve_backward` from scratch going from solving 5 systems of equations to just 2. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618014 Pulled By: anjali411 fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538	2021-12-01 07:34:38 -08:00
Krishna Ramasimha	478069d6f2	Remove duplicate .DS_Store in gitignore (#68981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68981 Reviewed By: samdow Differential Revision: D32707039 Pulled By: soulitzer fbshipit-source-id: 346f0f3de583d995be34c252db4f9f26cd574ba8	2021-12-01 07:28:33 -08:00
Kshiteej K	e5e0c19882	OpInfo : embedding_bag (#67252 ) Summary: Adds OpInfo for `embedding_bag`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67252 Reviewed By: VitalyFedyunin Differential Revision: D32462157 Pulled By: zou3519 fbshipit-source-id: 70303349a718720c4fa47519fa94ae900e052939	2021-12-01 07:00:50 -08:00
Peter Bell	1da1707568	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32734911 Pulled By: cpuhrsch fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a	2021-12-01 05:43:19 -08:00
Facebook Community Bot	afff381824	Automated submodule update: tensorpipe (#69089 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `ed4bbe52b7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69089 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32725534 fbshipit-source-id: 73b1e0f67c957ca0220cd47179dd4b350a98fd33	2021-12-01 02:29:18 -08:00
Elias Ellison	a23d1036ab	Add ops for BI (mean) (#68826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68826 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32732465 Pulled By: eellison fbshipit-source-id: e8b185d89e5ecbe5c8e09d576c84a1f0a402a5e0	2021-12-01 00:45:00 -08:00
Elias Ellison	19b87292fc	Add TE fuser ops (#68825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68825 Factoring out the elementwise ops in tensorexpr fuser and adding their corresponding shape functions, since we need shape functions to fuse them with dynamic shapes Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32732466 Pulled By: eellison fbshipit-source-id: 69cacf6fbed8eb97e475f5d55b2eec0384fe8ec1	2021-12-01 00:43:42 -08:00
Rohan Varma	7fad758e02	[FSDP] AutoWrap Main API (#68155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68155 Per title ghstack-source-id: 144398229 Test Plan: CI Reviewed By: pbelevich, mrshenli Differential Revision: D32327954 fbshipit-source-id: 36bdf06c1c50932a93acbfa97017c549fa490a6c	2021-12-01 00:16:38 -08:00
Rohan Varma	999e52a795	[FileStore] log timeout in err msg (#69167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69167 Per title ghstack-source-id: 144378083 Test Plan: Ci Reviewed By: H-Huang Differential Revision: D32736119 fbshipit-source-id: f37fd3e4ac393c07eb8bd1f9202841d33d0a8aad	2021-11-30 23:29:09 -08:00
Juhyeong Kim	845a82b635	Debug positive definite constraints (#68720 ) Summary: While implementing https://github.com/pytorch/pytorch/issues/68644, during the testing of 'torch.distributions.constraint.positive_definite', I found an error in the code: [location](`c7ecf1498d/torch/distributions/constraints.py (L465-L468)`) ``` class _PositiveDefinite(Constraint): """ Constrain to positive-definite matrices. """ event_dim = 2 def check(self, value): # Assumes that the matrix or batch of matrices in value are symmetric # info == 0 means no error, that is, it's SPD return torch.linalg.cholesky_ex(value).info.eq(0).unsqueeze(0) ``` The error is caused when I check the positive definiteness of `torch.cuda.DoubleTensor([[2., 0], [2., 2]])` But it did not made a problem for `torch.DoubleTensor([[2., 0], [2., 2]])` You may easily reproduce the error by following code: ``` Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> const = torch.distributions.constraints.positive_definite >>> const.check(torch.cuda.DoubleTensor([[2., 0], [2., 2]])) tensor([False], device='cuda:0') >>> const.check(torch.DoubleTensor([[2., 0], [2., 2]])) tensor([True]) ``` The cause of error can be analyzed more if you give 'check_errors = True' as a additional argument for 'torch.linalg.cholesky_ex'. It seem that it is caused by the recent changes in 'torch.linalg'. And, I suggest to modify the '_PositiveDefinite' class by using 'torch.linalg.eig' function like the below: ``` class _PositiveDefinite(Constraint): """ Constrain to positive-definite matrices. """ event_dim = 2 def check(self, value): return (torch.linalg.eig(value)[0].real > 0).all(dim=-1) ``` By using above implementation, I get following result: ``` Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> const = torch.distributions.constraints.positive_definite >>> const.check(torch.cuda.DoubleTensor([[2., 0.], [2., 2.]])) tensor(True, device='cuda:0') >>> const.check(torch.DoubleTensor([[2., 0.], [2., 2.]])) tensor(True) ``` FYI, I do not know what algorithm is used in 'torch.linalg.eig' and 'torch.linalg.cholesky_ex'. As far as I know, they have same time complexity generally, O(n^3). It seems that in case you used special algorithms or finer parallelization, time complexity of Cholesky decomposition may be reduced to approximately O(n^2.5). If there is a reason 'torch.distributions.constraints.positive_definite' used 'torch.linalg.cholesky_ex' rather than 'torch.linalg.eig' previously, I hope to know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68720 Reviewed By: samdow Differential Revision: D32724391 Pulled By: neerajprad fbshipit-source-id: 32e2a04b2d5b5ddf57a3de50f995131d279ede49	2021-11-30 22:27:27 -08:00
Jacob Szwejbka	8586f374bc	[Pytorch Edge] Get Operator Version from model file (#68677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68677 Using in compatibility apis. Luckily the stream reader kinda just does this already so mostly just create a wrapper in our compatibility files Test Plan: ci Reviewed By: cccclai Differential Revision: D32573132 fbshipit-source-id: 86331c03a1eebcd86ed29b9c6cd8a8fd4fe79949	2021-11-30 21:10:21 -08:00
Ivan Yashchuk	219db3b4e1	Add OpInfo for torch.linalg.tensorsolve (#68810 ) Summary: This PR adds an OpInfo entry for tensorsolve function. The keyword argument is different from NumPy so a lambda function is needed to be passed to `ref=`. I had to change the dtypes for `test_reference_testing` because NumPy does computation internally using double for all linear algebra functions and maybe for some other functions. Using `torch.float64` and `torch.complex128` is more reliable for NumPy comparisons. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68810 Reviewed By: soulitzer Differential Revision: D32696065 Pulled By: mruberry fbshipit-source-id: a4305065d3e7d0097503dc05938b3c4784e14996	2021-11-30 20:31:12 -08:00
Jacob Szwejbka	b05237f5e4	[Pytorch Edge] Add bool to copy kernel (#69106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69106 this kernel sucks. Test Plan: ci Reviewed By: shoumikhin, cccclai Differential Revision: D32729888 fbshipit-source-id: c747d4bf3d5233c8ed15dba5e2c2d244ba7d4b3f	2021-11-30 19:45:42 -08:00
Peter Bell	e534c5efd7	CMake: Include instead of copying cpu kernel files (#67656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67656 Currently, each cpu kernel file is copied into the build folder 3 times to give them different compilation flags. This changes it to instead generate 3 files that `#include` the original file. The biggest difference is that updating a copied file requires `cmake` to re-run, whereas include dependencies are natively handled by `ninja`. A side benefit is that included files show up directly in the build dependency graph, whereas `cmake` file copies don't. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32566108 Pulled By: malfet fbshipit-source-id: ae75368fede37e7ca03be6ade3d4e4a63479440d	2021-11-30 19:13:53 -08:00
Nikita Shulga	f6f1b580f8	Fix mypy in cpp_extension.py (#69101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69101 Test Plan: Imported from OSS Reviewed By: atalman, janeyx99 Differential Revision: D32730081 Pulled By: malfet fbshipit-source-id: 76ace65b51850b74b175a3c4688c05e107873e8d	2021-11-30 16:01:55 -08:00
Nikita Shulga	6953b7e269	[BE] Fix mypy local run on MacOS (#69097 ) Summary: Unversioned python invocations should not be used, as it can be aliased to Python-2 Also invoke mypy as `python3 -mmypy` as binary aliases are not always available for user installation Pull Request resolved: https://github.com/pytorch/pytorch/pull/69097 Reviewed By: janeyx99 Differential Revision: D32729367 Pulled By: malfet fbshipit-source-id: 7539bd0af15f97eecddfb142dba7de7f3587083d	2021-11-30 15:52:23 -08:00
Eli Uriegas	aa2163eba5	.github: Add linux.large instance type (#69165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69165 We're hitting hard concurrency limits for built in github runners so let's use our own runners and make them non-ephemeral so they'll have basically constant uptime Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D32735494 Pulled By: seemethere fbshipit-source-id: c042c6f0fb23fd50acef312d96b0c89d02c93270	2021-11-30 14:45:51 -08:00
Jongsoo Park	e60fd10659	[fbgemm] remove assumption number of rows is in 32 bit (#69066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69066 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/781 And remove unnecessary looping inside parallel_for despite fbgemm routines support batching multiple rows Test Plan: CI Reviewed By: dskhudia, jianyuh Differential Revision: D32715453 fbshipit-source-id: 33c3e72f51c8ff5d02dafab4a8947d1230c2d551	2021-11-30 13:38:53 -08:00
Scott Wolchok	ef7ed082ec	[PyTorch] Remove StringView from RecordFunction implementation [2/2] (#68411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68411 Avoids heap-allocating a std::string instance in before() each time even if it's not going to be used. ghstack-source-id: 144287655 Test Plan: Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty Before: P467922606 After: P467922626 Reviewed By: chaekit Differential Revision: D32453846 fbshipit-source-id: 18e1b482dbf5217add14cbaacd447de47cb5877b	2021-11-30 13:22:27 -08:00
Scott Wolchok	1d84d8c5d8	[PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410 First step toward not heap-allocating a string in RecordFunction::before() every time ghstack-source-id: 144287654 Test Plan: CI Reviewed By: chaekit Differential Revision: D32453847 fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87	2021-11-30 13:20:08 -08:00
Xiang Gao	22690c2cb6	Use `cub::FutureValue` to simplify 64bit indexing split of cub scan (#66711 ) Summary: https://github.com/NVIDIA/cub/pull/305 has landed to cub 1.15. This is ready to review and land. This PR contains https://github.com/pytorch/pytorch/pull/66219, please land that PR first before review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66711 Reviewed By: soulitzer Differential Revision: D32698306 Pulled By: ngimel fbshipit-source-id: 4cc6b9b24cefd8932f4d421c6d64ea20ea911f52	2021-11-30 13:15:36 -08:00
Stephen Jia	c48e6f014a	[vulkan] Update VMA settings to reduce memory usage (#69088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69088 It was found that the Vulkan backend was consuming a huge (~287 MB) of graphics memory when executing a lightweight segmentation model. In fact the Vulkan backend tends to consume a huge amount of memory in general. It was found that the reason for this is due to how the backend uses [VMA](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/). When allocating memory, VMA will first allocate a large block of memory, then subdivide that block to use for individual textures and buffers. The pattern is used because Vulkan has a limit on the number of `vkDeviceMemory` allocations that can be active at one time. It turns out that the Vulkan backend was using custom memory pools with a block size of 64 MiB, meaning that the minimum amount of memory used will be 64 MiB at minimum. Furthermore, usage of the [linear allocation algorithm](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/custom_memory_pools.html#linear_algorithm) resulted in minimal reuse of memory, leading to the creation of much more blocks than were actually required and a huge amount of unused memory. By avoiding the use of custom memory pools and instead simply using the default memory pool provided by VMA, the library seems to have a much easier time minimizing the amount of unused memory. This change reduces memory usage down to 20 MB when running the aforementioned segmentation model. This diff also reduces the preferred block size to 32 MiB and removes the use of the linear allocation algorithm in case custom memory pools are needed in the future. Test Plan: Build and run vulkan_api_test: ``` cd ~/pytorch BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: beback4u Differential Revision: D32653767 fbshipit-source-id: b063a8ea76d34b57d0e2e6972ca5f6f73f2fd7e5	2021-11-30 12:45:41 -08:00
Rohan Varma	fcd1375b2b	[DDP][BE][Docs] Clarify checkpoint support (#68827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68827 Add a note about current checkpoint support with DDP. Note that this does not include the features enabled with _set_static_graph yet, as it is an undocumented private API. Once we support static graph as beta feature in OSS we can add to the note here. ghstack-source-id: 144285041 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D32624957 fbshipit-source-id: e21d156a1c4744b6e2a807b5b5289ed26701886f	2021-11-30 12:37:37 -08:00
Rohan Varma	994f110a6f	Refactor DDP checkpoint tests (#68792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68792 Refactor tests to be more clear what features are supported and unsupported under certain DDP configs. ghstack-source-id: 144285040 Test Plan: Ci Reviewed By: pbelevich Differential Revision: D32609498 fbshipit-source-id: 5231242054d4ff6cd8e7acc4a50b096771ef23d1	2021-11-30 12:36:14 -08:00
Maksim	49abda208b	[JIT] internal build bug fix (#69061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69061 `warning` breaks this build [D32622152](https://www.internalfb.com/diff/D32622152) Test Plan: Imported from OSS Differential Revision: D32712448 Pulled By: makslevental fbshipit-source-id: c7a70487bd0b95ac8b242522c36597d36072201f	2021-11-30 12:32:11 -08:00
Ben Koopman	5e0302e1d0	[quant][embedding qat] Set FakeQuant zeropoint dtype matches observer (#68390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68390 Observer zero_point's dtype can be float, in the specific case of `torch.per_channel_affine_float_qparams`. This change sets FakeQuant's zero_point dtype accordingly. Test Plan: `pytest test/quantization/core/test_workflow_module.py -v -k "embedding"` `pytest test/quantization/eager/test_quantize_eager_qat.py -v -k "embedding"` Imported from OSS Reviewed By: vkuzo Differential Revision: D32446405 fbshipit-source-id: cca7aade68ff171887eeeae42801f77d934dad4c	2021-11-30 12:21:14 -08:00
Nikul Patel	8f9f559453	ammend tensors.rst and torch.rst for doc generation (#69030 ) Summary: (This is my first contribution to PyTorch) Added missing operations to docs added in https://github.com/pytorch/pytorch/issues/64430. Please let me know if I've done anything wrong. Fixes https://github.com/pytorch/pytorch/issues/68928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69030 Reviewed By: samdow Differential Revision: D32706826 Pulled By: soulitzer fbshipit-source-id: edcc175a8f9bc69450a39059580c05edce699312	2021-11-30 12:04:13 -08:00
Michael Suo	0aa9d177fe	[fx] remove CPatcher (#69032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69032 I am removing it because, for packaging-related reasons, it's easier if torch.fx is a pure Python module. I don't think there is much reason to keep it: this functionality was experimental, has no known users currently, and we didn't have a clear path to turning it on by default due to regressions in tracing performance. Also, it only was ever enabled for `rand` and friends. Technically the removal of the `enable_cpatching` arguments on `symbolic_trace` and `Tracer.__init__` are BC-breaking, but the docstrings clearly state that the argument is experimental and BC is not guaranteed, so I think it's fine. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D32706344 Pulled By: suo fbshipit-source-id: 501648b5c3610ae71829b5e7db74e3b8c9e1a480	2021-11-30 11:59:57 -08:00
Jan-Hendrik Ewers	81246ed01c	Markdown was linking to repo rather than pytorch.org website (#68937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68937 Reviewed By: samdow Differential Revision: D32707264 Pulled By: soulitzer fbshipit-source-id: c534f008087def33784dde701130769e2058aa9f	2021-11-30 11:51:24 -08:00
Eli Uriegas	251686fc4c	Revert D32706197: Sparse: Implement simple unary ufuncs operators Test Plan: revert-hammer Differential Revision: D32706197 (`fbaa19a6fa`) Original commit changeset: 65e1acb36457 fbshipit-source-id: 45c4b486f9eee200d5a1f6d46d267617124f8a5e	2021-11-30 10:50:12 -08:00
Joel Schlosser	8fef7c09f5	Remove finput from slow2d signatures (#68896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655874 Pulled By: jbschlosser fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5	2021-11-30 09:47:24 -08:00
Don Jang	cd3e37cbe4	[Static Runtime] [Code Cleanup] Reduce indentation depth in ops.cpp (#69028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69028 This change converts ``` if (..) { ... } else { ... } # end of function ``` into ``` if(...) { ... return; } ... ``` in ops.cpp to remove the else branch to reduce the indentation depth by 1 for better readability. Test Plan: N/A Reviewed By: hlu1 Differential Revision: D32506235 fbshipit-source-id: a4fd5188bd680dba5dcad2b6e873735a54497664	2021-11-30 09:41:46 -08:00
David Berard	cfc75c2137	[JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143676384 Test Plan: In the following script, conv_add_relu fusion is not observed without this change, but is observed when this change is added. ``` from typing import List, Optional import torch class Model(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) self.add_tensor = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) def forward( self, inp: torch.Tensor, bias: Optional[torch.Tensor], stride: List[int], padding: List[int], dilation: List[int], groups: int, ): # weight = torch.zeros((3, 3, 7, 7), device="cuda") inp = inp.to("cuda") conv_result = torch.conv2d( inp, self.weight, bias, stride, padding, dilation, groups ) add_result = conv_result.add_(self.add_tensor) return add_result.relu_() torch.jit.export def make_prediction(self, inp: torch.Tensor): bias = None groups = 1 stride = (1, 1) padding = (0, 0) dilation = (1, 1) return self.forward(inp, bias, stride, padding, dilation, groups) if __name__ == "__main__": # generate some sample input groups = 1 channels_in = 3 channels_out = 3 kernel_size = (7, 7) stride = (1, 1) padding = (0, 0) dilation = (1, 1) inp = torch.rand((64, 3, 432, 432)) weight = torch.rand( (channels_out, channels_in, kernel_size[0], kernel_size[1]), device="cuda" ) bias = None model = Model() model.eval() script = torch.jit.script(model) script = torch.jit.freeze(script) script = torch.jit.optimize_for_inference(script) print("~~~~ FORWARD ~~~~") print(script.graph) print("with preserved_attrs") print(torch.sum(script.forward(inp, bias, stride, padding, dilation, groups))) ``` Reviewed By: cpuhrsch Differential Revision: D32329330 fbshipit-source-id: c0f10da4b9540c588819efe3ec540baa0fae4b35	2021-11-30 09:31:57 -08:00
Ansha Yu	7342b654a1	[static runtime] dequantize out variant (#68664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68664 Reland D32187063 (`f120335643`), fixing lint Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Test on ctr_mbl_feed model 307210374 on 696 inputs Before: 0.0569016 ms. 0.296647%. aten::dequantize (10 nodes) After: 0.0423128 ms. 0.220481%. static_runtime::dequantize_copy (10 nodes, out variant) Reviewed By: mikeiovine Differential Revision: D32566429 fbshipit-source-id: b95dfc4c5e4115e083794093bc1571c7b1d72f5b	2021-11-30 09:03:26 -08:00
Nikita Shulga	d3de3546d9	Revert D32099294: Split cuda: list cpp files that go in _cu library explicitly Test Plan: revert-hammer Differential Revision: D32099294 (`b47ae9810c`) Original commit changeset: 8a3582944b6b fbshipit-source-id: eab63e6ba3db3e17f404292a3659823607627576	2021-11-30 08:42:19 -08:00
Richard Zou	6fea7499c2	CompositeImplicitAutograd compliance testing (#65819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65819 Related to #61669. Functions registered as CompositeImplicitAutograd MUST work for most, if not all, backends. This includes Tensor subclasses. To achieve this, we (PyTorch) impose a set of constraints on how a CompositeImplicitAutograd function can be written. Concretely, this PR adds tests for all OpInfos that checks for compliance. The things that get tested in this PR apply to composite ops and are that: - the op does not change the metadata of a Tensor without performing dispatches - the op does not call set_ or resize_ - the op does not directly access the data ptr The mechanism for the test is to create a new __torch_dispatch__ object, CompositeCompliantTensor. For each operator, we wrap all inputs in CompositeCompliantTensor, turn on python mode for it, and send it through the operator. Non-CompositeImplicitAutograd operators will pass the test because they perform a dispatch to backend code. Here's how CompositeCompliantTensor catches problems: - If it sees set_ or resize_ getting called, it will directly error out - After each operation, CompositeCompliantTensor checks to make sure that its metadata is consistent with that of the thing it is wrapping. If the CompositeImplicitAutograd op modifes the metadata directly (through e.g. the TensorImpl API) then the metadata will go out of sync. - If data_ptr gets called, that returns a nice error (because the storage is meta). CompositeCompliantTensor is written in an interesting way. First off, if a view operation occurs (e.g. `B = A.view_op(...)`), then B.storage() must alias A.storage() where B.storage() is CompositeCompliantTensor's storage, NOT the storage of the tensor it is wrapping. This is an invariant in autograd, see #62182 for details. To handle this we replay the view on A's storage and set it as B's storage. Secondly, there are cases where the metadata is allowed to go out of sync. I believe this is only possible with in-place view functions, like transpose_, t_, squeeze_, unsqueeze_. Those are special cased. Finally, I added a new section to aten/src/ATen/native/README.md about what it means to be CompositeImplicitAutograd Compliant Test Plan: - run tests Reviewed By: ezyang, bdhirsh Differential Revision: D31268369 Pulled By: zou3519 fbshipit-source-id: 31634b1cbe1778ab30196013cfc376ef9bd2e8b1	2021-11-30 07:35:22 -08:00
Bin Bao	b83e8d560b	[LT] Sync LTC branch changes on torch/csrc/lazy/core (#69012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69012 Some changes to torch/csrc/lazy/core were done on the lazy_tensor_staging branch (https://github.com/pytorch/pytorch/pull/68427). Merge those back into the trunk. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32708696 Pulled By: desertfire fbshipit-source-id: e54b978f2bdb9c7db27880f60246fdf1e8b41019	2021-11-30 07:09:15 -08:00
Hao Lu	39ab417107	[Static Runtime] Fix fb::expand_dims schema (#68636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68636 Same old alias problem Reviewed By: mikeiovine Differential Revision: D32556204 fbshipit-source-id: 4d380f0110ad1be83f705e6d6910a6aaf818ec08	2021-11-30 06:28:29 -08:00
Vasiliy Kuznetsov	5b37ac54cb	dbr quant overhead [14/x]: cache whether an op is a module (#68877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877 Saves whether an op type is a module during tracing, so we can avoid recalculating this when validating the op during inference. This leads to a small speedup. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` ``` // MobileNetV2, 1x3x224x224, function level profiling // before validate_cur_op - 1.77% // after validate_cur_op - 1.41% ``` Reviewed By: jerryzh168 Differential Revision: D32646149 Pulled By: vkuzo fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892	2021-11-30 06:13:06 -08:00
Michael Dagitses	b47ae9810c	Split cuda: list cpp files that go in _cu library explicitly (#67216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67216 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32099294 Pulled By: dagitses fbshipit-source-id: 8a3582944b6b48af1ac31c5df09a7e6e838892c4	2021-11-30 04:24:55 -08:00
Peter Bell	174eea8a05	Remove native_functions.yaml dependency from IndexKernel.{cpp,cu} (#66914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66914 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31856105 Pulled By: dagitses fbshipit-source-id: 8729783b68879b509ae6b66ce145de0af68aad8c	2021-11-30 04:24:52 -08:00
Peter Bell	f7d598948a	Remove native_functions.yaml dependency from TensorModeKernel.cu (#66913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66913 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31856102 Pulled By: dagitses fbshipit-source-id: 8888a1984adef09104a40ae683d091143cd1f4fa	2021-11-30 04:22:09 -08:00
Andrew Tulloch	ec1339a48b	[CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability (#68906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68906 The existing PyTorch pinned memory allocator has been a challenge for scalability in multi-GPU inference workloads. The existing allocator is mostly designed in the context of training, where in the process-per-GPU setup we have natural sharding of the global locks and lower allocation rates (perhaps O(100 allocs/sec) per process. In this setup we might have globally on the order of O(200k allocs/sec) - e.g. 20k QPS and 10 allocs/query. This is a different domain. In the existing allocator, we observe tail latencies of cudaEventCreate and cudaEventDestroy (while holding the lock) can also completely stall all allocations, which is undesirable. The idea here is to retain a similar design to the existing PyTorch allocator - eager collection of used memory, no lock-free or deferred tricks, identical semantics around events, but to: a) split up the locks around the various critical datastructures, and b) do as little work as possible while holding any process-global mutexes (importantly, no CUDA runtime API calls) c) pool CUDA events manually (as cuda event creation is a bottleneck at high rates from multiple threads). This does require a bit of care, but I believe it's correct. In general the threading and state transitions are fairly simple. With these improvements, microbenchmarks show significant improvements (1.5x-3x). Importantly, real workloads also show significant improvements, especially WRT tail latency and stalls. Test Plan: Unit tests all pass. With a synthetic benchmark such as: ``` static void BM_copies_baseline(benchmark::State& state) { auto N = state.range(0); auto scale = state.range(1); auto object_size_min = N; auto object_size_max = scale * N; auto device = at::Device(at::kCUDA, at::cuda::current_device()); uint64_t bytes_copied = 0; uint64_t allocs = 0; auto stream = at::cuda::getCurrentCUDAStream(); for (auto _ : state) { auto object_size = static_cast<int64_t>(expf(folly::Random::randDouble( logf(object_size_min), logf(object_size_max)))); auto tensor = at::empty( {object_size}, at::TensorOptions().dtype(at::kByte).pinned_memory(true)); at::cuda::CachingHostAllocator_recordEvent( tensor.storage().data_ptr().get_context(), stream); bytes_copied += object_size; allocs += 1; } state.counters["BW"] = benchmark::Counter(bytes_copied, benchmark::Counter::kIsRate); state.counters["Allocs"] = benchmark::Counter(allocs, benchmark::Counter::kIsRate); } BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(1)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(4)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(16)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(64)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(128)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(256)->UseRealTime(); ``` I observe roughly 1.5-3x improvements. End to end application testing also sees significant improvements in the contended scenario. Reviewed By: jianyuh, ngimel Differential Revision: D32588784 fbshipit-source-id: ee86c3b7ed4da6412dd3c89362f989f4b5d91736	2021-11-30 02:49:43 -08:00
Jiewen Tan	0cdeb586ae	[LTC] Upstream some utilities (#69046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046 This commit upstreams utilities including ExceptionCleanup, MaybeRef, Iota, ToVector, ToOptionalVector and GetEnumValue. Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.* Reviewed By: wconstab, Chillee Differential Revision: D32709090 Pulled By: alanwaketan fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714	2021-11-30 02:44:02 -08:00
Peter Bell	fbaa19a6fa	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32706197 Pulled By: cpuhrsch fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b	2021-11-30 00:30:30 -08:00
Mikhail Zolotukhin	3186d36972	[TensorExpr] Supress TracerWarnings in test_unsupported in test_jit_fuser_te.py. (#68757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68757 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600951 Pulled By: ZolotukhinM fbshipit-source-id: 7b9859d7dee1e9803b8fde5d071890a72d30cec9	2021-11-30 00:06:36 -08:00
Mikhail Zolotukhin	75ce040620	[TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756 That fixes some warnings in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600952 Pulled By: ZolotukhinM fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98	2021-11-30 00:06:34 -08:00
Mikhail Zolotukhin	a93f505ee5	[TensorExpr] IRPrinter: print sizes and name when visiting a Buf. (#68755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68755 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600950 Pulled By: ZolotukhinM fbshipit-source-id: 925da05d958497791cb9176a5d15d8315334aa24	2021-11-30 00:05:10 -08:00
Priya Ramani	8cc9ec2f6b	Add option to get input dtype from user (#68751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751 Add option to get input dtype from user for AOT compilation Test Plan: BI model compiles and runs fine ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64' Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated WARNING: Logging before InitGoogleLogging() is written to STDERR W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc The compiled llvm assembly code was saved to bi.compiled.ll The compiled model was saved to bi.compiled.pt ``` > Error thrown when input dims and input types sizes don't match ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64' . . terminate called after throwing an instance of 'c10::Error' what(): [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same . . . ``` Reviewed By: ljk53 Differential Revision: D32477001 fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5	2021-11-29 21:39:49 -08:00
Thomas Viehmann	ac1fe91dc9	Clean up some THC includes (#69024 ) Summary: These seem to not be needed and cause ninja to rebuild the files at every build. (There also is THCStorage.cu, but hopefully this will go away with https://github.com/pytorch/pytorch/issues/68556 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69024 Reviewed By: soulitzer Differential Revision: D32705309 Pulled By: ngimel fbshipit-source-id: 5255297f213fdcf36e7203de7460a71291f8c9a0	2021-11-29 20:55:27 -08:00
John Clow	ce53baf573	Merging the implementations of ClearProfiling (#67575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497548 Pulled By: Gamrix fbshipit-source-id: fb656b017d405487e25bd2407b069a702769659f	2021-11-29 19:48:56 -08:00
Peter Bell	e6a8d15a4c	cpu_kernel_vec: Hoist stride checks out of loop (#68962 ) Summary: `cpu_kernel_vec` does stride checks to determine whether to use the vectorized or scalar inner loop. Since it uses a 1d `for_each` loop, it re-does these stride checks after every loop over the inner dimension. For iterators with small inner dimensions, this means a significant proportion of the time may be spent just on stride checks. This changes it to use a 2d loop so the stride checks are further amortized. With the below `copy_` benchmark, it saves 50% of the callgrind instruction count from 28.4 Million to 13.5 Million and 30% time speedup from 22.8 us to 16.4 us on my machine. ``` from torch.utils.benchmark import Timer import timeit timer = Timer( stmt="b.copy_(a);", setup=""" auto a = at::rand({10000, 8}, at::kComplexDouble).slice(0, 0, -1, 2); auto b = at::empty_like(a); """, num_threads=1, language='c++', timer=timeit.default_timer ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68962 Reviewed By: mrshenli Differential Revision: D32684191 Pulled By: ngimel fbshipit-source-id: 582af038314a0f999f43669e66edace38ff8d2dc	2021-11-29 19:37:58 -08:00
Joel Schlosser	61ea2fc35e	Fix device type / dtype handling for parametrized test names (#65217 ) Summary: This PR absolves `_TestParametrizer`s (e.g. `ops`, `modules`, `parametrize`) of the responsibility of adding device type (e.g. `'cpu'`, `'cuda'`, etc.) / dtype (e.g. 'float32') to generated test names. This fixes repeated instances of the device string being added to generated test names (e.g. `test_batch_norm_training_True_cuda_track_running_stats_True_cuda_affine_True_cuda`). The responsibility for placing device / dtype suffixes is now handled by `instantiate_device_type_tests()` instead so it is added a single time. It will place `<device>_<dtype>` at the end of the test name unconditionally, maintaining the current naming convention. As part of this work, I also tightened the semantics through some additional error case handling: * Composing multiple decorators that each try to handle the same parameter will error out with a nice message. This includes the case to trying to compose `modules` + `ops`, as they each try to handle `dtype`. Similarly, `ops` + `dtypes` is forbidden when both try to handle `dtype`. This required changes in the following test files: * `test/test_unary_ufuncs.py` * `test/test_foreach.py` * The `modules` / `ops` decorators will now error out with a nice message if used with `instantiate_parametrized_tests()` instead of `instantiate_device_type_tests()`, since they're not (currently) written to work outside of a device-specific context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65217 Reviewed By: mruberry Differential Revision: D32627303 Pulled By: jbschlosser fbshipit-source-id: c2957228353ed46a0b7da8fa1a34c67598779312	2021-11-29 19:02:23 -08:00
Stefan Ollinger	933d5b561f	Fixed links to RNN docs in comments (#68828 ) Summary: Fixed links to RNN docs in comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/68828 Reviewed By: soulitzer Differential Revision: D32702384 Pulled By: jbschlosser fbshipit-source-id: 577c88842cde555534d9a39fa7dfd24164d71552	2021-11-29 18:55:53 -08:00
Stas Bekman	863f321c6d	Fix typo in AdaptiveLogSoftmaxWithLoss docs (#68926 ) Summary: Fixes a typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/68926 Reviewed By: soulitzer Differential Revision: D32702366 Pulled By: jbschlosser fbshipit-source-id: 8975aad3e817dab33359cf29182b4bd1e3aa1299	2021-11-29 18:51:58 -08:00
mrshenli	b8c3693281	Remove autograd-enabled collective APIs from distributed docs (#69011 ) Summary: These APIs are not yet officially released and are still under discussion. Hence, this commit removes those APIs from docs and will add them back when ready. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69011 Reviewed By: fduwjj Differential Revision: D32703124 Pulled By: mrshenli fbshipit-source-id: ea049fc7ab6b0015d38cc40c5b5daf47803b7ea0	2021-11-29 18:14:50 -08:00
Peter Bell	178010455d	Vectorized: Use inline namespace instead of anonymous (#67655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67655 Some of the CPU operators already use the `namespace CPU_CAPABILITY` trick to avoid anonymous namespacing, like [`PowKernel.cpp`](`cd51d2a3ec/aten/src/ATen/native/cpu/PowKernel.cpp (L14)`). This extends that pattern to the `Vectorized` class, which avoids `Wsubobject-linage` warnings like I was getting in #67621. For many functions, it was necessary to add `inline` because the functions are defined in a header. There were no link errors previously because the anonymous namespace ensured they were not exposed to linkage. Similarly, free functions defined in an anonymous namespace might need the `C10_UNUSED` attribute to silence warnings about the function not being called in the only translation unit that it's defined in. By removing the anonymous namespace, these decorators are no longer necessary. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32566109 Pulled By: malfet fbshipit-source-id: 01d64003513b4946dec6b709bd73bbab05772134 Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-29 16:54:17 -08:00
Scott Wolchok	1d0416397a	[PyTorch] Change from unique_ptr to optional for RecordFunction state (#68397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68397 Now that hot paths can avoid instantiating RecordFunction by using shouldRunRecordFunction, we can improve efficiency for profiling cases by avoiding a large heap allocation. ghstack-source-id: 144235785 Test Plan: 1) Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty. Before: P467891381 After: P467902339 2) Run without --stressTestRecordFunction to verify no regression in the regular dispatcher path. Before: P467902381 After: P467902403 Reviewed By: chaekit Differential Revision: D32448365 fbshipit-source-id: 2d32a3bd82c60d2bb11fc57bb88bf3f02aa3fa25	2021-11-29 16:35:36 -08:00
Scott Wolchok	7194faed7f	[PyTorch] Optimzize mergeRunCallbacks for RecordFunction (#68387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68387 Function call overhead on tryRunCallback was notable. ghstack-source-id: 144235788 Test Plan: Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts `--stressTestRecordFunction --op empty`. Before: P467891339 After: P467891381 Reviewed By: chaekit Differential Revision: D32443863 fbshipit-source-id: c0b3dd40bbd5bca976c2ebb0f21aa62e097b302e	2021-11-29 16:33:36 -08:00
Andrey Talman	f1a3512b78	Adding Linux cuda 11.5 workflows (#68745 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68745 Reviewed By: janeyx99 Differential Revision: D32707491 Pulled By: atalman fbshipit-source-id: 100facfdcc0fc2f68e203a696856852faa25ee08	2021-11-29 16:21:00 -08:00
JUBIN CHHEDA	27228656e6	[FX][docs] Document gotcha about training flag (#68915 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68915 Reviewed By: jamesr66a Differential Revision: D32705410 Pulled By: jubinchheda fbshipit-source-id: a44c17ab0e62465823ceb0ef983ae330b50fb073	2021-11-29 16:13:32 -08:00
Vasiliy Kuznetsov	f253370bb9	dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841 Caches the current module's hook type as an attribute on the module. This requires the assumption that the current module's hook type does not change during inference, which is an assumption we can commit to. Test Plan: correctness ``` python test/test_quantization.py TestQuantizeDBR ``` performance ``` // MobileNetV2, 1x3x224x224, function profiling // before get_module_hook_type -> 2.58% // after get_module_hook_type -> 0.73% ``` Reviewed By: jerryzh168 Differential Revision: D32630881 Pulled By: vkuzo fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65	2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov	2ad4727ad9	dbr quant: fix debugging fqn info for converted model (#68840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68840 Fixes the debugging FQN info for a converted model. Some of this information was missing because eager mode convert performed module swaps. This information is only used in debugging and is not used for inference. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` turn `enable_logging` on in `auto_trace.py`, the FQN is now displayed for a converted model Reviewed By: jerryzh168 Differential Revision: D32630884 Pulled By: vkuzo fbshipit-source-id: be8c43343abfdab9fe0af39499d908ed61a01b78	2021-11-29 16:10:21 -08:00
Vasiliy Kuznetsov	a03fe9ba61	dbr quant overhead[12/x]: turn off overrides for module convert output hook (#68839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68839 We can assume that there are no overrides needed for the hook which dequantizes the module outputs, so we can turn them off explicitly. While this does not lead to a measurable perf win, it makes things easier to debug by eliminating the no-op overrides. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32630886 Pulled By: vkuzo fbshipit-source-id: 1719c168f5f21f3e59c80a3b6d0f32ebb1c77ef8	2021-11-29 16:10:18 -08:00
Vasiliy Kuznetsov	515db56755	dbr quant: remove unnecessary outputs hook (#68838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68838 Removes an unnecessary outputs hook on the top level module. The same hook is already called inside the regular hook flow. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: soulitzer Differential Revision: D32630882 Pulled By: vkuzo fbshipit-source-id: aa5f1b1cb866051013195d7311949333b08df4de	2021-11-29 16:10:15 -08:00
Vasiliy Kuznetsov	e3af582f92	dbr quant overhead[11/x]: speed up module convert hook (#68837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68837 The module convert hook dequantizes the module outputs if the user requested the module to adhere to a certain dtype for outputs. This is most commonly used for the assumption that a model's overall return type if fp32. This PR precalculates for each module whether this hook will do anything, and returns early if it does not. This prevents the overhead of this hook to influencing any module which does not need this hook. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` perf ``` MobileNetV2, 1x3x224x224, function level profiling // before outputs_convert_hook - 0.73% // before outputs_convert_hook - 0.45% ``` Reviewed By: jerryzh168 Differential Revision: D32630885 Pulled By: vkuzo fbshipit-source-id: 7ee84de742fc0c752b66d20d097405a754c8b480	2021-11-29 16:10:12 -08:00
Vasiliy Kuznetsov	be70477a7b	dbr quant overhead[10/x]: disable torch_function overrides for leaf nodes (#68836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68836 If we have a leaf module like a `torch.nn.Conv2d`, DBR quant handles the input and output of the module and should treat the inside of this module as invisible. Specifically, there is no need to override the `F.conv2d` call if the parent module is already being overridden. Before this PR, `__torch_function__` was still overridden for the insides of leaf modules, and the override was a no-op. There was some overhead in these overrides because they were checking the hook type. This PR adds a fast global override so we can skip overridding the insides of leaf modules. This has some performance benefits in the prepare model, because we now skip overriding all of the inner functions in observers. Test Plan: testing ``` python test/test_quantization.py TestQuantizeDBR ``` perf ``` // MobileNetV2, 1x3x224x224, comparing fp32 with dbr quant, Mac OS laptop // before fp32: 0.017837 seconds avg fx_prepared: 0.021963 seconds avg, 0.812143 speedup vs fp32 fx_quantized: 0.012632 seconds avg, 1.412056 speedup vs fp32 dt_prepared: 0.034052 seconds avg, 0.523820 speedup vs fp32 dt_quantized: 0.018316 seconds avg, 0.973829 speedup vs fp32 // after fp32: 0.020395 seconds avg fx_prepared: 0.026969 seconds avg, 0.756230 speedup vs fp32 fx_quantized: 0.013195 seconds avg, 1.545611 speedup vs fp32 dt_prepared: 0.033432 seconds avg, 0.610023 speedup vs fp32 dt_quantized: 0.018244 seconds avg, 1.117866 speedup vs fp32 ``` Reviewed By: jerryzh168 Differential Revision: D32630883 Pulled By: vkuzo fbshipit-source-id: 6365e1c514726d8b2a4b3a51f114f5fed3ebe887	2021-11-29 16:08:52 -08:00
Thomas J. Fan	1342f19a8c	Add ModuleInfo-based device transfer tests (#68092 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/65488; addresses the problem that got it reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68092 Reviewed By: mruberry Differential Revision: D32299103 Pulled By: jbschlosser fbshipit-source-id: bc298aca15368f2acb5082e6fb6eedea60b5d75f	2021-11-29 15:48:40 -08:00
Ivan Yashchuk	89a145fd91	Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007 This PR adds a new function to the sparse module. `sampled_addmm` computes α(A @ B) spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices. This function is currently restricted to single 2D matrices, it doesn't support batched input. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32435799 Pulled By: cpuhrsch fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392	2021-11-29 15:43:29 -08:00
Peter Bell	af49805a73	Port lerp to structured kernels (#68924 ) Summary: Ref https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68924 Reviewed By: jbschlosser Differential Revision: D32697409 Pulled By: bdhirsh fbshipit-source-id: b098533e46f8bdbb995c76db0e6a124ab2b076b8	2021-11-29 15:11:30 -08:00
zilinzhu	62847a2b9c	Fix bug on empty GLOO_SOCKET_IFNAME_ENV (#68933 ) Summary: This PR is trying to fix the no device bug when user resets the `GLOO_SOCKET_IFNAME_ENV` with ```bash export GLOO_SOCKET_IFNAME_ENV= ``` Thank you for your time on reviewing this PR :). cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68933 Reviewed By: soulitzer Differential Revision: D32690633 Pulled By: mrshenli fbshipit-source-id: f6df2b8b067d23cf1ec177c77cc592dc870bda72	2021-11-29 15:05:38 -08:00
Thomas J. Fan	b468566208	Add ModuleInfo-based CPU / GPU parity tests (#68097 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/64694; fixes issues with the diff there Pull Request resolved: https://github.com/pytorch/pytorch/pull/68097 Reviewed By: mruberry Differential Revision: D32300650 Pulled By: jbschlosser fbshipit-source-id: f3a5e72b019d4eddd7202854999eab61fffc9006	2021-11-29 14:58:07 -08:00
Pearu Peterson	fb63bb60ec	Strided masked norm. (#68584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68584 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32581285 Pulled By: cpuhrsch fbshipit-source-id: 896ee1e58957b46c2f6a16a170adff4cb3b8da62	2021-11-29 14:23:27 -08:00
Santiago Castro	f776f30780	Keep the sequence or mapping type in `default_collate` (#68779 ) Summary: `default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality). Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`). Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it. cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779 Reviewed By: wenleix Differential Revision: D32651129 Pulled By: ejguan fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4	2021-11-29 13:14:20 -08:00
Kurt Mohler	d9e7d85390	Remove TH/THC Storage (#68556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67852 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/68556 Reviewed By: ejguan Differential Revision: D32652758 Pulled By: ngimel fbshipit-source-id: 170956fca112606f9008abe09b92c6ddc411be09	2021-11-29 12:55:20 -08:00
Peter Bell	f5fa91ba2e	Sparse: Add additional opinfo tests (#68886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68886 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32697933 Pulled By: cpuhrsch fbshipit-source-id: fffdd1bc663cc1bc49abe8cf3680982d1cb497bc	2021-11-29 12:49:20 -08:00
Rohan Varma	3bd7dbf119	[Dist CI][BE] Remainder of c10d/store tests run in subprocess (#68822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822 Per title, we switched over c10d_gloo and nccl and results look good so far, so switch the rest of them as well. After the only dist tests that won't run in subprocess are pipe and fsdp tests, which historically haven't had much flakiness. ghstack-source-id: 144213522 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624330 fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b	2021-11-29 10:59:39 -08:00
Rohan Varma	250d0bd20b	[RPC][Dist CI][BE] RPC tests run in subprocess (#68821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821 Continuing effort to move most distributed tests to run in subprocess for better reproducibility + reduce flakiness. ghstack-source-id: 144213520 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624199 fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2	2021-11-29 10:58:08 -08:00
Eli Uriegas	51f4ac40fd	ci: Use default blank if no TEST_CONFIG (#69008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69008 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32699051 Pulled By: seemethere fbshipit-source-id: 9ed12fe8a7f541c6eda77182cfd1b0a733a545f0	2021-11-29 10:05:20 -08:00
Nikita Shulga	ee59a09772	Implement sharding for MacOS jobs (#68784 ) Summary: Do not run distributed tests as part of separate shard, but keep it inside one of the two shards (to limit concurrency problems) Fixes https://github.com/pytorch/pytorch/issues/68260 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68784 Reviewed By: seemethere, janeyx99 Differential Revision: D32653440 Pulled By: malfet fbshipit-source-id: ebe5bbc30bdf67e930f2c766c920932700f3a4e4	2021-11-29 09:31:42 -08:00
Ivan Yashchuk	61a4204d80	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32650366 Pulled By: cpuhrsch fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98	2021-11-29 08:58:49 -08:00
Peter Bell	9ee5db490b	neg_sparse: Fix output dtype (#68885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68885 `torch.neg` should preserve the input dtype but for sparse tensors it was promoting integers to floating point. This would have been picked up by the OpInfo-based test, but `neg` wasn't marked with `supports_sparse=True` so it was never run. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32680008 Pulled By: cpuhrsch fbshipit-source-id: 502f8743c1c33ab802e3d9d097792887352cd220	2021-11-29 08:48:22 -08:00
Vinnam Kim	7b701ce2d4	Add set_to_none option to C++ API (#68801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68167. Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801 Reviewed By: mruberry Differential Revision: D32625239 Pulled By: jbschlosser fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82	2021-11-29 08:42:39 -08:00
Bin Bao	787ded5103	Add lazy::Shape::numel() (#68314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314 Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions). This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts. Test Plan: add unit tests Reviewed By: alanwaketan Differential Revision: D32409138 fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2	2021-11-29 08:38:09 -08:00
Richard Zou	3d504ae1b4	[RELAND] Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#68073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68073 Relanding the original PR. Its body was as follows: Followup to https://github.com/pytorch/pytorch/pull/60787 It turns out that the original PR was wrong for unboxed kernels. We recently ran into this in https://github.com/facebookresearch/functorch/issues/124 For unboxed kernels, the correct type for a Tensor?[] argument is actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>` ghstack-source-id: 144204580 Test Plan: - assert that https://github.com/facebookresearch/functorch/issues/124 actually works Reviewed By: gchanan Differential Revision: D32313601 Pulled By: zou3519 fbshipit-source-id: 8028d5f34eecabc53d603bd54d6b6748b5db461a	2021-11-29 08:31:55 -08:00
Eli Uriegas	17ba936da0	.github: Migrate XLA tests to GHA (#64320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64320 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30684490 Pulled By: seemethere fbshipit-source-id: 5d2657f9aa4c7082591239a5bb095cc85d2cde66	2021-11-29 08:30:57 -08:00
Eli Uriegas	f398320e0d	packaging: Include lazy headers in package_data (#68817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68817 Looks like these files are getting used by downstream xla so we need to include them in our package_data Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32622241 Pulled By: seemethere fbshipit-source-id: 7b64e5d4261999ee58bc61185bada6c60c2bb5cc	2021-11-29 08:29:48 -08:00
Richard Zou	871cd7c5b9	Forward-mode AD support for torch.split, torch.split_with_sizes (#68566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68566 These are just auto-linear as pointed out by Jeffrey. ghstack-source-id: 143814393 Test Plan: - Run OpInfo tests. Reviewed By: albanD, soulitzer Differential Revision: D32520239 Pulled By: zou3519 fbshipit-source-id: 807115157b131e6370f364f61db1b14700279789	2021-11-29 07:50:53 -08:00
Philip Meier	3315c4b31e	add instructions for unhandled exceptions in assert_close (#68722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68722 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684446 Pulled By: mruberry fbshipit-source-id: 04fe5730721d24e44692cdc9bb327484356ead3f	2021-11-28 21:35:53 -08:00
Mike Ruberry	d095f498a0	Tensor docs (#63308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62146. Modernizes and clarifies the documentation of torch.tensor and torch.as_tensor, highlighting the distinction in their copying behavior and preservation of autograd history. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63308 Reviewed By: albanD, ngimel Differential Revision: D30338025 Pulled By: mruberry fbshipit-source-id: 83a0c113e4f8fce2dfe086054562713fe3f866c2	2021-11-28 21:26:12 -08:00
Mike Ruberry	6ae34ea6f8	Revert D32521980: Add linalg.lu_factor Test Plan: revert-hammer Differential Revision: D32521980 (`b10929a14a`) Original commit changeset: 26a49ebd87f8 fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82	2021-11-28 17:22:15 -08:00
lezcano	b10929a14a	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32521980 Pulled By: mruberry fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb	2021-11-27 17:52:48 -08:00
kshitij12345	01ddd5dde6	[opinfo] use dtypes instead of dtypesIfCPU (#68732 ) Summary: Reland https://github.com/pytorch/pytorch/issues/67619 Replace usage of dtypesIfCPU with dtypes in OpInfo class and also make it a mandatory argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68732 Reviewed By: jbschlosser Differential Revision: D32594344 Pulled By: mruberry fbshipit-source-id: 660b38aef97752ba064228e8989041ed1d5777fe	2021-11-27 16:07:51 -08:00
Xiang Gao	cffad597ea	Tune test_reference_numerics_normal (#68019 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68019 Reviewed By: albanD Differential Revision: D32482535 Pulled By: mruberry fbshipit-source-id: 48300a5c6a4484fb81789f9049d3f08272d9f31c	2021-11-26 18:59:31 -08:00
Maksim	5fdcc20d8d	[JIT][Symbolic Shape Analysis] expose op shape functions (#68748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68748 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32598605 Pulled By: makslevental fbshipit-source-id: c97a06cd0fe143a6ea14db65fc5d3f76abdff312	2021-11-24 17:17:01 -08:00
Natalia Gimelshein	f14c16e509	Revert D32599540: [pytorch][PR] implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension. Test Plan: revert-hammer Differential Revision: D32599540 (`bc3bdbc8f4`) Original commit changeset: 9227f7e99318 fbshipit-source-id: edfe7072073d910a49be52e1b8c2d374ef71e9ec	2021-11-24 17:15:31 -08:00
Nikolay Korovaiko	c2e3b92db4	partial revert of D32522826 (#68889 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68889 Reviewed By: cpuhrsch, ejguan Differential Revision: D32650385 Pulled By: Krovatkin fbshipit-source-id: 2c4a30cfc729a023b592b6b6e1959bbd2ad6f7cf	2021-11-24 17:05:20 -08:00
Guo Yejun	4afa5ea0ab	native_functions.yaml: remove SparseXPU which is added by accident (#68791 ) Summary: gen_backend_stubs.py will report 'assert' when generate code with SparseXPU dispatch key for external backends, if SparseXPU is in native_functions.yaml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68791 Reviewed By: cpuhrsch, ejguan Differential Revision: D32646303 Pulled By: bdhirsh fbshipit-source-id: 64e42cc40468bc8c696a31b4b7c0cc3728866a64	2021-11-24 15:34:17 -08:00
Nikita Shulga	c5f63f859e	Add slow path to `getCustomClassTypeImpl` (#68717 ) Summary: This fixes custom class registration issue when `typeid` is not guaranteed to be unique across multiple libraries, which is the case for libc++ runtime on MacOS 11 in particular for M1 From [libcxx/include/typeinfo](`78d6a7767e/include/typeinfo (L139)`): ``` // -------------------------------------------------------------------------- // // NonUniqueARMRTTIBit // -------------------------------------------------------------------------- // // This implementation of type_info does not assume always a unique copy of // the RTTI for a given type inside a program. It packs the pointer to the // type name into a uintptr_t and reserves the high bit of that pointer (which // is assumed to be free for use under the ABI in use) to represent whether // that specific copy of the RTTI can be assumed unique inside the program. // To implement equality-comparison of type_infos, we check whether BOTH // type_infos are guaranteed unique, and if so, we simply compare the addresses // of their type names instead of doing a deep string comparison, which is // faster. If at least one of the type_infos can't guarantee uniqueness, we // have no choice but to fall back to a deep string comparison. ``` But `std::type_index` hash is computed always assuming that implementation is unique By adding a slow path this problem can be fixed in those scenarios. Fixes https://github.com/pytorch/pytorch/issues/68039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68717 Reviewed By: seemethere Differential Revision: D32605187 Pulled By: malfet fbshipit-source-id: 8d50e56885b8c97dad3bc34a69c47ef879456dd1	2021-11-24 15:00:47 -08:00
Nikita Shulga	14dc9759f2	Revert D32650384: OpInfos for torch.{flatten, column_stack} Test Plan: revert-hammer Differential Revision: D32650384 (`aceb46e4ce`) Original commit changeset: 9ead83b378d0 fbshipit-source-id: 3ef281e536b1f21a6f13c6c51309021cf92b53b2	2021-11-24 14:55:26 -08:00
Ariel Kwiatkowski	96929ea995	Update empty and empty_like examples in docs (#68874 ) Summary: For some reason, the example for `torch.empty` showed the usage of `torch.empty_like` and the other way around. These are now swapped. Fixes https://github.com/pytorch/pytorch/issues/68799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68874 Reviewed By: wenleix Differential Revision: D32646645 Pulled By: ejguan fbshipit-source-id: c8298bcaca450aaa4abeef2239af2b14cadc05b3	2021-11-24 14:01:06 -08:00
Andrew Tulloch	d44e610efa	[CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer (#68749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68749 The logic for asynchronous copies (either HtoD or DtoH) using cudaMemcpyAsync relies on recording an event with the caching host allocator to notify it that a given allocation has been used on a stream - and thus it should wait for that stream to proceed before reusing the host memory. This tracking is based on the allocator maintaining a map from storage allocation pointers to some state. If we try to record an event for a pointer we don't understand, we will silently drop the event and ignore it (`9554ebe44e/aten/src/ATen/cuda/CachingHostAllocator.cpp (L171-L175)`). Thus, if we use the data_ptr of a Tensor instead of the storage allocation, then reasonable code can lead to incorrectness due to missed events. One way this can occur is simply by slicing a tensor into sub-tensors - which have different values of `data_ptr()` but share the same storage, for example: ``` image_batch = torch.randn(M, B, C, H, W).pin_memory() for m in range(M): sub_batch = image_batch[m].cuda(non_blocking=True) # sub_batch.data_ptr() != image_batch.data_ptr() except for m == 0. # however, sub_batch.storage().data_ptr() == image_batch.storage().data_ptr() always. ``` Therefore, we instead use the storage context pointer when recording events, as this is the same state that is tracked by the caching allocator itself. This is a correctness fix, although it's hard to determine how widespread this issue is. Using the storage context also allows us to use a more efficient structure internally to the caching allocator, which will be sent in future diffs. Test Plan: Test added which demonstrates the issue, although it's hard to demonstrate the race explicitly. Reviewed By: ngimel Differential Revision: D32588785 fbshipit-source-id: d87cc5e49ff8cbf59052c3c97da5b48dd1fe75cc	2021-11-24 13:20:22 -08:00
Juhyeong Kim	bc3bdbc8f4	implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension. (#68644 ) Summary: Implemented submodule for https://github.com/pytorch/pytorch/issues/68050 Opened cleaned, final version of PR for https://github.com/pytorch/pytorch/issues/68240 Explanation: I am trying to contribute to PyTorch by implementing distributions for symmetric matrices like Wishart distribution and Inverse Wishart distribution. Although there is a LKJ distribution for the Cholesky decomposition of correlation matrices, it only represents equivalence to restricted form of Wishart distribution. [https://arxiv.org/abs/1809.04746](https://arxiv.org/abs/1809.04746) Thus, I started implementing Wishart distribution and Inverse Wishart distribution seperately. I added a short code about the 'torch.distributions.constraints.symmetric', which was not included in 'torch.distributions.constraints' previously. i.e., 'torch.distributions.constraints' contains module like 'positive_definite' constraints, but it just assumes symmetricity of the input matrix. [Link](`1adeeabdc0/torch/distributions/constraints.py (L466)`) So, I think it will be better if we have constraint checking symmetricity of the tensors in PyTorch. We may further utilize it like `constraints.stack([constraints.symmetric, constraints.positive_definite])` for the constraint of the covariance matrix in Multivariate Normal distribution, for example, to check if the random matrix is a symmetric positive definite matrix. cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/68644 Reviewed By: jbschlosser Differential Revision: D32599540 Pulled By: neerajprad fbshipit-source-id: 9227f7e9931834a548a88da69e4f2e9af7732cfe	2021-11-24 13:13:28 -08:00
Jerry Zhang	1940cc028e	[quant][graphmode][fx] Fork subgraph_rewriter from torch.fx to quantization (#68228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68228 Forking this for now so that we can make changes as we need, the changes can be merged back to torch.fx later Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537713 fbshipit-source-id: 326598d13645fcc28ef2c66baaac6a077b80fd0c	2021-11-24 10:49:05 -08:00
anjali411	aceb46e4ce	OpInfos for torch.{flatten, column_stack} (#67555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67555 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32650384 Pulled By: anjali411 fbshipit-source-id: 9ead83b378d0ece60569e1a0fc7d8849f89566b3	2021-11-24 10:25:37 -08:00
lezcano	cf54416925	Add docs entry for `adjoint`. (#68869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68869 As per title. cc brianjo mruberry anjali411 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32647456 Pulled By: anjali411 fbshipit-source-id: 2cb053a6884e2b22d3decc058e86d10f355fcb84	2021-11-24 10:03:41 -08:00
anjali411	c7d5e0f53f	OpInfos for torch.atleast_{1d, 2d, 3d} (#67355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67355 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32649416 Pulled By: anjali411 fbshipit-source-id: 1b42e86c7124427880fff52fbe490481059da967	2021-11-24 09:55:39 -08:00
Kurt Mohler	b69155f754	Avoid dtype mismatch error in `torch.save` if storages are unallocated (#68787 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58970 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68787 Reviewed By: mruberry Differential Revision: D32617425 Pulled By: anjali411 fbshipit-source-id: fe7f2374e4ef4428346a0a202cae8e0d382e03ab	2021-11-24 09:51:29 -08:00
Nikita Shulga	208e109dbf	Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse Test Plan: revert-hammer Differential Revision: D32633806 (`b28ddd72d3`) Original commit changeset: b98db0bd655c fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502	2021-11-24 09:15:17 -08:00
Ivan Kobzarev	7802953dd5	[nnc][quantization] quantized ops for BI bytedoc via aten (#68790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68790 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32609427 Pulled By: IvanKobzarev fbshipit-source-id: de8f4209befe2509f5033888c739554470768290	2021-11-24 08:59:44 -08:00
Yi Zhang	31d36fd35d	fix sccache issue on Windows CPU (#68870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68796 ``` 2021-11-24T10:12:40.7634007Z Compile requests 4312 2021-11-24T10:12:40.7634484Z Compile requests executed 4300 2021-11-24T10:12:40.7634823Z Cache hits 4227 2021-11-24T10:12:40.7635122Z Cache hits (C/C++) 4227 2021-11-24T10:12:40.7636139Z Cache misses 62 2021-11-24T10:12:40.7636930Z Cache misses (C/C++) 62 2021-11-24T10:12:40.7637333Z Cache timeouts 0 2021-11-24T10:12:40.7637839Z Cache read errors 0 2021-11-24T10:12:40.7638161Z Forced recaches 0 2021-11-24T10:12:40.7638489Z Cache write errors 0 2021-11-24T10:12:40.7638828Z Compilation failures 1 2021-11-24T10:12:40.7639180Z Cache errors 10 2021-11-24T10:12:40.7639490Z Cache errors (C/C++) 10 2021-11-24T10:12:40.7639856Z Non-cacheable compilations 0 2021-11-24T10:12:40.7640244Z Non-cacheable calls 0 2021-11-24T10:12:40.7640601Z Non-compilation calls 12 2021-11-24T10:12:40.7640987Z Unsupported compiler calls 0 2021-11-24T10:12:40.7641426Z Average cache write 0.104 s 2021-11-24T10:12:40.7641763Z Average cache read miss 6.000 s 2021-11-24T10:12:40.7642110Z Average cache read hit 0.046 s 2021-11-24T10:12:40.7642485Z Failed distributed compilations 0 ``` https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870 Reviewed By: ejguan Differential Revision: D32646289 Pulled By: janeyx99 fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c	2021-11-24 08:04:59 -08:00
Howard Huang	be7e159e71	Remove extraneous logging (#68830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68830 No logical changes, removing a logging statement that was accidentally committed. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jjlilley mrzzd Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32628711 Pulled By: H-Huang fbshipit-source-id: 070190b92f97c8e38d8bb03124c13cb061fc9ec1	2021-11-24 07:15:50 -08:00
Ivan Kobzarev	7d8a79b6f3	[nnc] llvm_codegen quantization types for vectype (#68736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68736 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32596261 Pulled By: IvanKobzarev fbshipit-source-id: 0388c3b5ae58eb16921d25d9a784f82f1bb924fc	2021-11-24 01:17:39 -08:00
Ivan Yashchuk	b28ddd72d3	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32633806 Pulled By: cpuhrsch fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066	2021-11-23 22:55:46 -08:00
Nikita Shulga	b5b62b3408	Cleanup old TD logic (#68842 ) Summary: Remove `--determine-from` option from run_test.py and remove all references from corresponding test scripts Followup after https://github.com/pytorch/pytorch/pull/64921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68842 Reviewed By: seemethere, janeyx99 Differential Revision: D32631418 Pulled By: malfet fbshipit-source-id: bdb5dd888c1d97dfaf95c1f299bf8073f3de9588	2021-11-23 18:45:42 -08:00
Donald Dong	d9f3feb5a2	[SR] Use std::vector::reserve for StaticModule constants (#68834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68834 This diff uses std::vector::reserve for constructing constants in StaticModule. We can also avoid two extra iterations over all the graph nodes. This diff should technically improve its performance by a tiny bit. Test Plan: - [x] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 Reviewed By: mikeiovine Differential Revision: D32628806 fbshipit-source-id: 99dd2a7a36e86899ca1fe5300f3aa90d30a43726	2021-11-23 18:00:04 -08:00
Kent Gauen	8fb9ce4927	Update Documentation to Make CUDA Call Explicit (#67973 ) Summary: I am clarifying in the docs to make the call to cudaStreamWaitEvent explicit. Fixes https://github.com/pytorch/pytorch/issues/67866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67973 Reviewed By: mruberry Differential Revision: D32620261 Pulled By: ngimel fbshipit-source-id: 1fc8beb2062baaddb013ea4d7b10da2baa10f15e	2021-11-23 16:25:37 -08:00
andrewor	79b67d9a4a	[Quant] Refactor handling of FixedQParams operators (#68143 ) Summary: Summary: FixedQParams operators do not need fake quantization in the prepare step. This commit introduces FixedQParamsObserver and makes FixedQParamsFakeQuantize a simple wrapper around this observer. It also removes the fake quantize logic in forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143 Test Plan: Added two tests: python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns python3 test/test_quantization.py TestQuantizeFx.test_register_patterns Reviewers: Jerry Zhang Subscribers: Jerry Zhang, Supriya Rao Tasks: T104942885 Tags: pytorch Reviewed By: albanD Differential Revision: D32484427 Pulled By: andrewor14 fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662	2021-11-23 15:26:10 -08:00
Shirong Wu	998daf44d6	All get_attr node to be in64 type (#68818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68818 Operator Support was blocking all node with dtype int64 from lowering. This diff ease the condition, for input from get_attr node(which are known not gonna be used for trt compute) to have dtype int64. Reviewed By: brad-mengchi, 842974287 Differential Revision: D32609457 fbshipit-source-id: ea255f3281349a4254cb6abdeed671ab2c0216ba	2021-11-23 15:21:47 -08:00
Nikita Shulga	78dce417a1	[BE] Simplify magma installation logic (#68778 ) Summary: Difference between `CUDA_VERSION` is magma package name is just a dot between major and minor In process of refactoring, discovered that some docker images set `CUDA_VERSION` to contain minor revision, so modified pattern to strip it, i.e. `cuda-magma102` would be installed for `CUDA_VERSION=10.2.89` and `cuda-magma113` would be installed for `CUDA_VERSION=11.3.0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68778 Reviewed By: seemethere Differential Revision: D32605365 Pulled By: malfet fbshipit-source-id: 43f8edeee5b55fdea6b4d9943874df8e97494ba1	2021-11-23 14:57:44 -08:00
Xiao Wang	2cd48d14ef	Fix `test_svd_errors_and_warnings` warning message when cuda >= 11.5 (#68683 ) Summary: In SVD cusolverDnXgesvd computations, When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan. When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue. Related: https://github.com/pytorch/pytorch/issues/68259 #64533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68683 Reviewed By: dagitses Differential Revision: D32583576 Pulled By: mruberry fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0	2021-11-23 14:16:23 -08:00
Omkar Salpekar	8e343ba5db	Revert D32611368: [pytorch][PR] Initial version of general convolution_backward Test Plan: revert-hammer Differential Revision: D32611368 (`445b31abff`) Original commit changeset: 26d759b7c908 fbshipit-source-id: e91f45f0f31150e60d657a3964b7e42027beff58	2021-11-23 13:39:36 -08:00
Pritam Damania	84047ff342	Add API usage logging to ShardedTensor and fix a few tests. (#68771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68771 ghstack-source-id: 143974518 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D32601562 fbshipit-source-id: ed624137efab94fbe556609bb40cca14e69d9bac	2021-11-23 13:30:59 -08:00
Han Qi	959cb03132	Populate operator_input_sizes_ (#68542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D32508159 fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8	2021-11-23 12:18:06 -08:00
Artsiom Sanakoyeu	c0e6dc9ac7	[pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733 ) Summary: After 'maximize' flag was introduced in https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints. After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD. To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733 Reviewed By: albanD Differential Revision: D32480963 Pulled By: asanakoy fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2	2021-11-23 11:42:16 -08:00
Eli Uriegas	73f494d690	.circleci: Remove migrated CUDA 10.2 build (#68782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68782 These builds are no longer required for slow_gradcheck and should be removed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D32606679 Pulled By: seemethere fbshipit-source-id: e4827a6f217b91c34cfab6c2340e3272f3db1522	2021-11-23 09:50:53 -08:00
Samantha Andow	23288fdacc	Making norms inputs independent (#68526 ) Summary: An update to https://github.com/pytorch/pytorch/issues/67442 to make sure all of the inputs produced are independent Updates group_norm and instance_norm (local_response_norm was already producing independent inputs) Also updates instance_norm for a bug in one set of inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/68526 Reviewed By: ngimel Differential Revision: D32532076 Pulled By: samdow fbshipit-source-id: 45b9320fd9aecead052b21f838f95887cfb71821	2021-11-23 09:41:36 -08:00
Peter Bell	e7e1b76106	Require CMake 3.13 when building with Ninja (#68731 ) Summary: There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles. For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584 This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731 Reviewed By: jbschlosser Differential Revision: D32604545 Pulled By: malfet fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866	2021-11-23 09:34:20 -08:00
Caspar van Leeuwen	3282386aa4	Added additional string to search cpu flags for vnni detection (#67686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67685 cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/67686 Reviewed By: ejguan Differential Revision: D32109038 Pulled By: malfet fbshipit-source-id: 3ea6e4cc1aa82831fd6277129a67c8241a5591a5	2021-11-23 09:32:53 -08:00
Wanchao Liang	98e51895ef	[dist_quant] change op registration to each file instead (#68797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68797 This change dist quantization op registration to each file instead, allow torch deploy test to pass ghstack-source-id: 143994945 Test Plan: wait for sc Reviewed By: jbschlosser Differential Revision: D32610679 fbshipit-source-id: 3ade925286f1ed0f65017939f1ad3f5c539e1767	2021-11-23 09:20:26 -08:00
Joel Schlosser	445b31abff	Initial version of general convolution_backward (#65219 ) Summary: Towards [convolution consolidation](https://fb.quip.com/tpDsAYtO15PO). Introduces the general `convolution_backward` function that uses the factored-out backend routing logic from the forward function. Some notes: * `finput` is now recomputed in the backward pass for the slow 2d / 3d kernels instead of being saved from the forward pass. The logic for is based on the forward computation and is present in `compute_finput2d` / `compute_finput3d` functions in `ConvUtils.h`. * Using structured kernels for `convolution_backward` requires extra copying since the backend-specific backward functions return tensors. Porting to structured is left as future work. * The tests that check the routing logic have been renamed from `test_conv_backend_selection` -> `test_conv_backend` and now also include gradcheck validation using an `autograd.Function` hooking up `convolution` to `convolution_backward`. This was done to ensure that gradcheck passes for the same set of inputs / backends. The forward pass routing is done as shown in this flowchart (probably need to download it for it to be readable since it's ridiculous): ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/137186002-5bca75ca-f911-4e61-8245-ec07af841506.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/139731619-9d0d436e-cce3-4bc3-8eaf-d469f667f0d7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65219 Reviewed By: mruberry Differential Revision: D32611368 Pulled By: jbschlosser fbshipit-source-id: 26d759b7c908ab8f19ecce627acea7bd3d5f59ba	2021-11-23 08:19:45 -08:00
Jerry Zhang	a31aea8eaa	[quant][graphmode][fx] Add support for specifying reference quantized module mapping in backend_config_dict (#68227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68227 This PR adds two keys to backend_config_dict: "root_module": the root module for the pattern (since we may have patterns for fused ops) "reference_quantized_module_for_root": the corresponding reference quantized module for the root Test Plan: ``` python test/test_quant_trt.py TestQuantizeFxTRTOps python test/test_quant_trt.py TestConvertFxDoNotUse ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537711 fbshipit-source-id: 6b8f36a219db7bb6633dac53072b748ede8dfa78	2021-11-22 21:35:04 -08:00
Zafar	b845b9876b	[sparsity] Fix for the failing pruner test (#68794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68794 The pruner `test_constructor` fails because of a typo in the regular expression matching for the error that the pruner throws. This fixes it. Test Plan: Separate test is not needed -- single letter change. Previous test: `python test/test_ao_sparsity.py -- TestBasePruner Reviewed By: ngimel Differential Revision: D32609589 Pulled By: z-a-f fbshipit-source-id: 800ef50c8cdbf206087bc6f945d1830e4af83c03	2021-11-22 21:07:24 -08:00
Junjie Wang	d6a68e0b8d	[PyTorch][3/N] Enable the rest forward spec options for ShardedEmbedding and ShardedEmbeddingBag. (#67799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67799 We have enabled the sharding embedding and embedding bag in https://github.com/pytorch/pytorch/pull/67188 and https://github.com/pytorch/pytorch/pull/66604. We now want to enable as many parameters as defined in doc as possible: https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding_bag.html, https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding.html. For the ones that we don't support we just throw exception. Last but not least, we use get to get params instead of directly using the key. ghstack-source-id: 143987066 Test Plan: Unit test & CI Reviewed By: pritamdamania87 Differential Revision: D31985333 fbshipit-source-id: 3794241b81eecc815bc4390679d0bb0323f4ae72	2021-11-22 20:33:03 -08:00
Saketh Are	5d300e761d	Add OpInfos for parcel Activation Functions I (#68521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68521 Reviewed By: jbschlosser Differential Revision: D32606625 Pulled By: saketh-are fbshipit-source-id: acf98a07c45bce95b1470bf9856577426265f3d1	2021-11-22 20:01:35 -08:00
Yutaro Sanada	74e6d2ce67	fix typos in jit_language_reference.rst (#68706 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68700 - indent problem Pull Request resolved: https://github.com/pytorch/pytorch/pull/68706 Reviewed By: mruberry Differential Revision: D32598916 Pulled By: jbschlosser fbshipit-source-id: 42af216e83fb48bbd311fc3d41fc3e8f5a2fef08	2021-11-22 19:09:06 -08:00
Zafar	e7d8f096c9	[sparsity] Fix GPU training for sparsity (#66412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66412 The GPU training was not supported in the sparsifier. The reason was that when the sparsifier was created the masks would default to the CPU. Attaching a GPU model to the sparsifier would throw an error. The solution is to create the masks on the same device as the weight. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31590675 Pulled By: z-a-f fbshipit-source-id: 98c2c1cedc7c60aecea4076e5254ef6b3443139e	2021-11-22 16:49:39 -08:00
pbialecki	0b0674121a	Fix strict aliasing rule violation in bitwise_binary_op (#66194 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66119 Failure on ARM Neoverse N1 before this PR: ``` ====================================================================== FAIL: test_bitwise_ops_cpu_int16 (__main__.TestBinaryUfuncsCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test result = test(self, param_kwargs) File "test_binary_ufuncs.py", line 315, in test_bitwise_ops self.assertEqual(op(a, b), op(a_np, b_np)) File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual self.assertEqual( File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!Found 176 different element(s) (out of 225), with the greatest difference of 21850 (-21846 vs. 4) occuring at index (0, 2). ====================================================================== FAIL: test_bitwise_ops_cpu_int32 (__main__.TestBinaryUfuncsCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test result = test(self, param_kwargs) File "test_binary_ufuncs.py", line 315, in test_bitwise_ops self.assertEqual(op(a, b), op(a_np, b_np)) File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual self.assertEqual( File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!Found 188 different element(s) (out of 225), with the greatest difference of 1335341061 (-1335341056 vs. 5) occuring at index (14, 8). ---------------------------------------------------------------------- ``` which passes now. CC malfet ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66194 Reviewed By: dagitses, bdhirsh, ngimel Differential Revision: D31430274 Pulled By: malfet fbshipit-source-id: bcf1c9d584c02eff328dd5b1f7af064fac5942c9	2021-11-22 16:43:09 -08:00
Zafar	d176c82bd5	[sparsity] Fix and enable the pruning tests (#66411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66411 The original tests were disabled, and had some bugs. This fixes those unittests. Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31590678 Pulled By: z-a-f fbshipit-source-id: ddbed34cc01d5f15580cb8f0033416f2f9780068	2021-11-22 15:28:12 -08:00
lezcano	b46c89d950	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32588230 Pulled By: mruberry fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910	2021-11-22 12:41:06 -08:00
nhankiet	a2e35e167b	refactor: update f-string for swa.utils.py (#68718 ) Summary: _ Update some old-style formats to f-string, for whole and coherent consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718 Reviewed By: jbschlosser Differential Revision: D32593746 Pulled By: albanD fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e	2021-11-22 11:23:18 -08:00
Rohan Varma	9554ebe44e	[Dist CI][BE] c10d gloo tests run in subprocess (#68504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504 Per title ghstack-source-id: 143928767 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32485100 fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2	2021-11-22 09:54:07 -08:00
Rohan Varma	ddc22ea3b2	[Dist CI][BE] test_c10d_nccl run in subprocess (#68503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503 Per title ghstack-source-id: 143928768 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32484990 fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f	2021-11-22 09:52:58 -08:00
Jane Xu	39ec0f321b	GHA: add print_tests_stats step to MacOS workflow (#68669 ) Summary: This will allow trunk CI to print test stats and upload stats (test reports, flaky tests, failed tests) to - Scribe - S3 - RDS Pull Request resolved: https://github.com/pytorch/pytorch/pull/68669 Reviewed By: dagitses Differential Revision: D32578169 Pulled By: janeyx99 fbshipit-source-id: c348e2070402754789f462b52cd71411984102e2	2021-11-22 08:26:52 -08:00
Erjia Guan	a66ff81837	[DataPipe] Optimize Grouper from N^2 to N (#68647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68647 Fixes #68539 When all data from source datapipe depletes, there is no need to yield the biggest group in the buffer. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32562646 Pulled By: ejguan fbshipit-source-id: ce91763656bc457e9c7d0af5861a5606c89965d5	2021-11-22 07:49:13 -08:00
Alban Desmaison	148f323856	Revert D32541986: [pytorch][PR] [opinfo] use dtypes instead of dtypesIfCPU Test Plan: revert-hammer Differential Revision: D32541986 (`d2a90f91bc`) Original commit changeset: 793d7d22c3ec fbshipit-source-id: c60c4be3416f6feb658b5da1bdf75f0cbe6bee24	2021-11-22 04:58:01 -08:00
Wanchao Liang	7c6a8a47db	[BE] minor improvement to dist quantization (#67401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67401 some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup ghstack-source-id: 143910067 ghstack-source-id: 143910067 Test Plan: wait for ci Reviewed By: mrshenli Differential Revision: D31979269 fbshipit-source-id: 85a2f395e6a3487dd0b9d1fde886eccab106e289	2021-11-21 23:31:59 -08:00
Wanchao Liang	fb556c91ce	[BE] delete frontend.cpp (#67400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400 c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy. ghstack-source-id: 143910066 ghstack-source-id: 143910066 Test Plan: wait for ci Reviewed By: navahgar Differential Revision: D31979270 fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225	2021-11-21 23:30:52 -08:00
kshitij12345	d2a90f91bc	[opinfo] use dtypes instead of dtypesIfCPU (#67619 ) Summary: Replace usage of `dtypesIfCPU` with `dtypes` in OpInfo class and also make it a mandatory argument. Also added DeprecationWarning on using `dtypesIfCPU` This raises a question : For an OpInfo entry, currently `dtypes` works for any external backend, `dtypesIfCPU` for CPU and `dtypesIfCUDA` and `dtypesIfROCM` for CUDA and ROCm respectively. If we merge `dtypes` and `dtypesIfCPU`, then for cases where external backend `dtypes` don't match cpu `dtypes` then it will lead to failures. Currently there are few issues (5 failures) due to this on XLA (we may add relevant skips for the same). If we agree that skip should be added, then should it be added via OpInfo using decorators mechanism or at the XLA end? I think XLA end makes more sense to me to have one source of skips. <details> <summary>XLA Fail Log</summary> ``` Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 ERROR [0.016s]: test_reference_eager_histogram_xla_float32 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, *param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 397, in compare_with_eager_reference Nov 01 11:48:26 cpu_inp, cpu_args, cpu_kwargs = cpu(sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 393, in cpu Nov 01 11:48:26 sample.args), to_cpu(sample.kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in to_cpu Nov 01 11:48:26 return {k: to_cpu(v) for k, v in x.items()} Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in <dictcomp> Nov 01 11:48:26 return {k: to_cpu(v) for k, v in x.items()} Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 390, in to_cpu Nov 01 11:48:26 raise ValueError("Unknown type {0}!".format(type(x))) Nov 01 11:48:26 ValueError: Unknown type <class 'NoneType'>! Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.575s]: test_reference_eager___rmatmul___xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 44 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 9.187201950435738e+18 (-9.187201950435738e+18 vs. 34.0), which occurred at index (0, 4). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.137s]: test_reference_eager_linalg_multi_dot_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 4 element(s) (out of 4) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 140230883884432.0 (0.0 vs. 140230883884432.0), which occurred at index (0, 0). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.461s]: test_reference_eager_matmul_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 37 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 7.661375630332297e+18 (-7.66128151259864e+18 vs. 94117733658072.0), which occurred at index (4, 5). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.050s]: test_reference_eager_remainder_autodiffed_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, **kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different dtypes. Got dtypes torch.int64 and torch.float32. Nov 01 11:48:26 Nov 01 11:48:26 ---------------------------------------------------------------------- ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/67619 Reviewed By: ngimel Differential Revision: D32541986 Pulled By: mruberry fbshipit-source-id: 793d7d22c3ec9b4778784254ef6f9c980b4b0ce2	2021-11-21 21:52:38 -08:00
Sameer Deshmukh	2d06c081ca	Fix test issue with householder_product for non-contiguous inputs. (#68231 ) Summary: Fixes failing tests for `householder_product` due to non-contiguous inputs as shown here: https://github.com/pytorch/pytorch/issues/67513. The floating point error was set too high for the complex64 type, so this PR reduces the error threshold for that particular type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68231 Reviewed By: dagitses Differential Revision: D32562774 Pulled By: mruberry fbshipit-source-id: edae4447ee257076f53abf79f55c5ffa1a9b3cb2	2021-11-21 21:47:23 -08:00
Ivan Yashchuk	3b3dc1ade8	Sparse CSR CPU: add `triangular_solve_out` (#62180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180 This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32581395 Pulled By: cpuhrsch fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844	2021-11-21 21:29:20 -08:00
Vasiliy Kuznetsov	e1c449ff34	dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432 Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op based on informationg gathered while tracing, and skipping execution when this flag is true. ``` MobileNetV2, function level profiling, 1x3x224x224 // before op_convert_before_hook = 3.25% // after op_convert_before_hook = 1.35% ``` Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463752 Pulled By: vkuzo fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b	2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov	ba230de118	dbr quant: remove more asserts from hot paths (#68431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68431 asserts have some overhead, removing the asserts used only to make mypy happy from the path which is hit in every forward. Test Plan: python test/test_quantization.py TestQuantizeDBR Reviewed By: jerryzh168 Differential Revision: D32463767 Pulled By: vkuzo fbshipit-source-id: 5f85f80144f35a725afe481bf027ea61ca6315bf	2021-11-21 07:08:26 -08:00
Vasiliy Kuznetsov	95c00cf029	speed up quantized relu6 inplace kernel (#68404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68404 The qclamp kernel is equal to (non inplace) or faster (inplace) than the qrelu6 kernel. Removing the qrelu6 kernel and routing qrelu6 to the qclamp kernel instead. Test Plan: ``` // correctness python test/test_quantization.py TestQuantizedOps.test_qrelu6 // benchmarking import torch import torch.nn.functional as F toq = torch.ops.quantized import time N_WARMUP = 5 N_ITER = 1000 data = torch.randn(32, 32, 64, 64) data = torch.quantize_per_tensor(data, 0.05, 0, torch.quint8) for _ in range(N_WARMUP): F.hardtanh(data, 0., 6., inplace=True) t1 = time.time() for _ in range(N_ITER): F.hardtanh(data, 0., 6., inplace=True) t2 = time.time() for _ in range(N_WARMUP): toq.relu6(data, inplace=True) t3 = time.time() for _ in range(N_ITER): toq.relu6(data, inplace=True) t4 = time.time() t_hardtanh = t2 - t1 t_qrelu6 = t4 - t3 print(t_hardtanh, t_qrelu6) // before 0.7156341075897217 1.4007949829101562 // after 0.6825599670410156 0.6571671962738037 ``` Reviewed By: jerryzh168 Differential Revision: D32463754 Pulled By: vkuzo fbshipit-source-id: a87fe5907d7b71d87eb1d5f6588cd509a88f2969	2021-11-21 07:08:23 -08:00
Vasiliy Kuznetsov	592053f115	dbr quant: simplify relatedness logic (#68374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68374 Cleans up the relatedness logic in DBR quant. For now, this is still duplicated with NS. A future PR should unify these mappings. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463750 Pulled By: vkuzo fbshipit-source-id: 90c2f5e79b86b1b595bd52650305bad88212ed49	2021-11-21 07:08:20 -08:00
Vasiliy Kuznetsov	f1021bcf38	dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373 Removes redundant logic in `op_needs_quantization`, for a small speedup. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before cur_op_needs_hooks - 0.76% op_needs_quantizaion - 0.41% // after cur_op_needs_hooks - 0.70% op_needs_quantization - 0.36% ``` Reviewed By: jerryzh168 Differential Revision: D32463762 Pulled By: vkuzo fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952	2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov	74ba1067a6	dbr quant overhead[7/x]: speed up AutoQuantizationState.reset_to_new_call (#68372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68372 Speeds up `AutoQuantizationState.reset_to_new_call` by going around the getattr and setattr overhead in `torch.nn.Module`. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before reset_to_new_call - 1.09% // after reset_to_new_call - 0.18% ``` Reviewed By: jerryzh168 Differential Revision: D32463759 Pulled By: vkuzo fbshipit-source-id: f3faa464372b0703f7d246680d62acd2782453e3	2021-11-21 07:08:15 -08:00
Vasiliy Kuznetsov	b7d58745c8	dbr quant overhead[6/x]: remove unneeded isinstance checks in `op_convert_before_hook` (#68371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68371 `isinstance` has some overhead, changing the code in `op_convert_before_hook` to use the information calculate during tracing instead which is cheaper. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` function level benchmarking ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before op_convert_before_hook = 3.55% isinstance = 1.62% // after op_convert_before_hook = 2.89% ``` Reviewed By: jerryzh168 Differential Revision: D32463757 Pulled By: vkuzo fbshipit-source-id: 129efe9c279a41f55b8bfd09132e21c0066298a6	2021-11-21 07:08:12 -08:00
Vasiliy Kuznetsov	b3a7d696b3	dbr quant overhead[5/x]: remove unnecessary asserts (#68370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68370 Removes asserts which are duplicate (the same condition is checked when calculating the hook type, so there is no need to check it again). For the assert in `validate_is_at_last_seen_idx`, rewrites it to raise an Error instead to ensure it does not get stripped in production environments. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463766 Pulled By: vkuzo fbshipit-source-id: 8a7b7e0bf270bc327f49bd3e5bd156339e846381	2021-11-21 07:08:09 -08:00
Vasiliy Kuznetsov	16a6e0612d	dbr quant: clean up key types in AutoQuantizationState mappings (#68369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369 `AutoQuantizationState` has various mappings keyed on IDs. Only `tensor_id_to_observer` actually needs string keys because it is an `torch.nn.ModuleDict`. This PR changes the other mappings to have integer keys, for simplicity and performance. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463765 Pulled By: vkuzo fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856	2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov	3fc9bc43c6	dbr quant overhead[4/x]: speed up hook type calculations (#68351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351 Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by bypassing the expensive `torch.nn.Module` getters and setters and fetching `_auto_quant_state` directly. Test Plan: Model level benchmarking is noisy. Individual `cProfile` results: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before get_module_hook_type - 5.96% get_torch_function_hook_type - 2.24% // after get_module_hook_type - 2.10% get_torch_function_hook_type - 0.57% ``` Reviewed By: jerryzh168 Differential Revision: D32463756 Pulled By: vkuzo fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58	2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov	c72ffee497	dbr quant overhead[3/x]: speed up AutoQuantizationState.mark_cur_op_complete (#68350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68350 `torch.nn.Module` has overhead for getting and setting attributes because it does various type checks on the attribute. This PR explicitly gets and sets the right thing for this particular function, avoding the type checks. Model level benchmarks are too noisy, but according to function level profiling this reduces the time spent in this function in a quantized model from 2.60% to 0.53%, on MobileNetV2 with input size 1x3x224x224. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463751 Pulled By: vkuzo fbshipit-source-id: a29beed2a2b87ca4df675a30dd591f797c8a1dbe	2021-11-21 07:06:42 -08:00
Vasiliy Kuznetsov	c7ecf1498d	dbr quant overhead[2/x]: precalculate op_convert_info (#68347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68347 Moves `op_convert_info` to be precalculated in the convert step instead of calculated dynamically. This should help with framework overhead. Test Plan: Noisy benchmark: ``` // before fp32: 0.016103 seconds avg fx_prepared: 0.019841 seconds avg, 0.811601 speedup vs fp32 fx_quantized: 0.011907 seconds avg, 1.352346 speedup vs fp32 dt_prepared: 0.035055 seconds avg, 0.459357 speedup vs fp32 dt_quantized: 0.018891 seconds avg, 0.852417 speedup vs fp32 // after fp32: 0.020535 seconds avg fx_prepared: 0.023071 seconds avg, 0.890070 speedup vs fp32 fx_quantized: 0.011693 seconds avg, 1.756206 speedup vs fp32 dt_prepared: 0.038691 seconds avg, 0.530734 speedup vs fp32 dt_quantized: 0.021109 seconds avg, 0.972793 speedup vs fp32 ``` The benchmark is too noisy to rely on, but according to `cProfiler` this removes about 5% of overhead. Reviewed By: jerryzh168 Differential Revision: D32463761 Pulled By: vkuzo fbshipit-source-id: e2ad0d7eeff7dbadf3aa379604bfe9bec0c228fe	2021-11-20 15:17:12 -08:00
Vasiliy Kuznetsov	9fba8971a7	dbr quant: move model level utils into own file (#68346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346 Some utility functions for DBR quant need to be aware of `AutoQuantizationState`. This PR moves them into their own file, so they can use the type directly without circular imports, and removes the mypy ignores which are no longer necessary after this change. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463763 Pulled By: vkuzo fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1	2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov	629f9a5532	dbr quant: clean up AutoQuantizationState.get_op_convert_info flag (#68345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68345 Removes a flag to unwrap scale and zp which was only needed by the FX rewriter. Moves the logic to happen in the FX tracer instead. This resolves a technical debt TODO. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463764 Pulled By: vkuzo fbshipit-source-id: ba7c976664c95111174fb65488bdac62b4f4984d	2021-11-20 15:17:07 -08:00
Vasiliy Kuznetsov	52cc9cb0ee	dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344 Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info` instead of the current op. This will make future performance improvements easier. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463758 Pulled By: vkuzo fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e	2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov	2755cf457c	dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343 Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to use less internal state, makes the function have no side effects by passing the state in the arguments, and moves the function to utils file. This will help with a future refactor to cache this info at runtime. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463760 Pulled By: vkuzo fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b	2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov	57472ec414	dbr quant: refactor `get_quantized_op` to only use `seen_op_info` (#68342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342 Before this PR, `get_quantized_op` required the current callable. After this PR, `get_quantized_op` only requires `seen_op_info`. The signature was changed slightly to return `None` if the original callable does not need replacement for quantization. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463768 Pulled By: vkuzo fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f	2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov	9cf4779ec9	dbr quant: refactor `get_func_output_obs_type` to only use `seen_op_info` (#68341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341 Before this PR, `get_func_output_obs_type` used information from the incoming op and its arguments, which makes it hard to cache. This PR refactors `get_func_output_obs_type` to only use information collected during tracing. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463755 Pulled By: vkuzo fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c	2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov	f8b084c563	dbr quant overhead[1/x]: remove expensive calls to named_modules (#68309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68309 This is the first of a series of PRs to reduce overhead of DBR quantization prototype. For now, the measurement of this work is not super scientific as there are a lot of low hanging fruit. As we speed up the prototype, we might need to invest in better benchmarking. Current benchmarking setup: * mac OS laptop with OMP_NUM_THREADS=1 * torchvision's mobilenet_v2 * input size 1x3x224x224 * we measure fp32 forward, prepared and quantized forward with FX quant vs DBR quant Note that due to small input size, this benchmark is pretty noisy. The goal here is to measure overhead of DBR quant logic (not the kernels), so small input is good as we want the kernels to take as little % of overall time as possible. High level goal is for DBR quant convert forward to approach the FX time. This first PR removes the expensive named_modules calls and resets the op counter in the op instead. According to cProf, this should be a 2 to 3 pct win. Test Plan: ``` benchmark: https://gist.github.com/vkuzo/1a4f98ca541161704ee3c305d7740d4a // before fp32: 0.020101 seconds avg fx_prepared: 0.020915 seconds avg, 0.961083 speedup vs fp32 fx_quantized: 0.012037 seconds avg, 1.670005 speedup vs fp32 dt_prepared: 0.037506 seconds avg, 0.535953 speedup vs fp32 dt_quantized: 0.022688 seconds avg, 0.885988 speedup vs fp32 // after fp32: 0.020722 seconds avg fx_prepared: 0.023417 seconds avg, 0.884893 speedup vs fp32 fx_quantized: 0.014834 seconds avg, 1.396942 speedup vs fp32 dt_prepared: 0.039120 seconds avg, 0.529700 speedup vs fp32 dt_quantized: 0.020063 seconds avg, 1.032831 speedup vs fp32 ``` Reviewed By: albanD Differential Revision: D32463753 Pulled By: vkuzo fbshipit-source-id: 1d7de7d9c4837e2b0ec815f0f67014c7600bb16c	2021-11-20 15:16:53 -08:00
Vasiliy Kuznetsov	ed6ef0eec4	dbr quantization: inline scale and zp (#68251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251 Before this PR, DBR quantization used to recalculate scale and zero_point in the converted model every time it was needed, which is slow. This PR creates a pass during the convert function to go through every observer in the model and cache its scale and zero_point. Note: only doing this for observers which correspond to int8 operations is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: VitalyFedyunin Differential Revision: D32463769 Pulled By: vkuzo fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29	2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov	ca499567d2	barebones numeric suite for quantization with dynamic tracing (#67776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776 This adds a barebones `add_loggers` and `extract_logger_info` API to analyze intermediate activations of models using quantization with dynamic tracing. The API generally matches the NS for FX tool, with some omissions. For now, this is moving fast to help us debug real models, and the API will be 100% aligned before this is marketed to users, in future PRs. Note: the current approach couples Numeric Suite with the quantization logic. This is not the best for composability, and may be changed at a future time. Test Plan: ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` Differential Revision: D32231332 D32231332 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a	2021-11-20 15:15:48 -08:00
Pearu Peterson	d0eff8d846	Strided masked softmin. (#68463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68463 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32576497 Pulled By: cpuhrsch fbshipit-source-id: 286edb2e7a5415df76858c69d0312743437b0fd8	2021-11-19 20:51:42 -08:00
Christian Puhrsch	75955e4ef8	[clone][sparse] Add `torch._C._sparse` namespace (#68672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672 This PR adds `python_module: sparse` to `native_function.yaml`. These functions would appear in `torch._C._sparse` namespace instead of just `torch`. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32517813 fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431	2021-11-19 19:47:38 -08:00
Xiang Gao	95f4cd0ba9	Implement topk with sort for some cases (#68632 ) Summary: Benchmark that compares original implementation and the sort implementation (this code should run on a branch without this patch): ```python import torch import timeit def tune_dtype(f): def ret(args, kwargs): for dtype in [torch.int8, torch.half, torch.float, torch.double]: f(args, *kwargs, dtype=dtype) return ret def tune_slice(f): def ret(args, *kwargs): slice = 1 while slice <= 256: f(args, *kwargs, slice=slice) slice = 2 return ret def tune_slice_size(f): def ret(args, kwargs): slice_size = 1 while slice_size <= 1_000_000: f(args, *kwargs, slice_size=slice_size) slice_size = 10 return ret def tune_k(f): def ret(args, slice_size, kwargs): k = 1 while k <= slice_size: f(args, *kwargs, k=k, slice_size=slice_size) k = 10 return ret def topk_with_sort(tensor, k, dim=-1, largest=True): values, indices = tensor.sort(dim=dim, descending=largest) return values.narrow(dim, 0, k), indices.narrow(dim, 0, k) def run50sync(f): for _ in range(50): f() torch.cuda.synchronize() def warmup(): N = 1000000 for i in range(1, N // 10000): torch.randn(i, device='cuda') def benchmark_one(slice, slice_size, k, dtype): input_ = torch.empty((slice, slice_size), dtype=dtype, device="cuda").random_() torch.cuda.synchronize() time = timeit.timeit(lambda: run50sync(lambda: torch.topk(input_, k, dim=1)), number=1) torch.cuda.synchronize() time_sort = timeit.timeit(lambda: run50sync(lambda: topk_with_sort(input_, k, dim=1)), number=1) method = "orig" if time < time_sort else "sort" speedup = time / time_sort print(f"(dtype={dtype}, slice={slice}, slice_size={slice_size}, k={k}) -> (method={method}, speedup={speedup})") if __name__ == "__main__": warmup() tune_dtype(tune_slice(tune_slice_size(tune_k(benchmark_one))))() ``` Benchmark result see next comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68632 Reviewed By: dagitses Differential Revision: D32566233 Pulled By: ngimel fbshipit-source-id: f7a508176ef3685b491048c4a6562121c60b8b2a	2021-11-19 17:18:20 -08:00
Rohan Varma	e554d8b89c	Fix retry on connect failure decorator (#68600 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68541 by checking string contains instead of exact eror Pull Request resolved: https://github.com/pytorch/pytorch/pull/68600 Reviewed By: dagitses, H-Huang Differential Revision: D32535592 Pulled By: rohan-varma fbshipit-source-id: 864c3e3c6831f2351c2949b2348af4f48a308522	2021-11-19 17:13:30 -08:00
Priya Ramani	8e51381bac	Make AOT compiler generic (#68637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68637 Make AOT compiler compile BI bytedoc model also making the compiler generic enough for other models. Shape propagation pass replaced with the new JIT tracer as shape propagation doesn't yet support dynamic shapes. Change to get and set input dtype to follow Test Plan: BI model changed to return a tuple of tensors instead of returning a tuple(list[tensor], list[string]). Modified BI model runs well with these changes ``` jf download GN91Hg9shoWzU1oPAGQ7X9SV8-5nbmQwAAAA --file bi.pt └─ $ ./compile_model.sh -m pytorch_dev_bytedoc -p bi.pt -v v1 -i "1,115;1" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=pytorch_dev_bytedoc + getopts m:p:v:i:h opt + case $opt in + MODEL_PATH=bi.pt + getopts m:p:v:i:h opt + case $opt in + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + INPUT_DIMS='1,115;1' + getopts m:p:v:i:h opt + require_arg m pytorch_dev_bytedoc + '[' -n pytorch_dev_bytedoc ']' + require_arg p bi.pt + '[' -n bi.pt ']' + require_arg i '1,115;1' + '[' -n '1,115;1' ']' + '[' '!' -f bi.pt ']' +++ dirname ./compile_model.sh ++ cd . ++ pwd -P + SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc + FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../.. + FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/pytorch_dev_bytedoc/v1/nnc ++ readlink -f bi.pt ++ sed 's/.pt.*//' + MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi + LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.ll + ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.s + COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt + KERNEL_FUNC_NAME=nnc_pytorch_dev_bytedoc_v1_forward + buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' Restarting Buck daemon because Buck version has changed... Buck daemon started. Parsing buck files... 0.6 sec (0/unknown) . . Parsing buck files: finished in 5.0 sec Creating action graph: finished in 0.7 sec Downloaded 3750/4917 artifacts, 16.09 Mbytes, 13.3% cache miss (for updated rules) Building: finished in 01:22.3 min (100%) 4995/4995 jobs, 4995/4995 updated Total time: 01:28.0 min BUILD SUCCEEDED Run with 56 threads Run with 56 threads Loading model... Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt Running forward ... WARNING: Logging before InitGoogleLogging() is written to STDERR W1115 11:42:18.170666 1597103 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) (Columns 1 to 10 0.5428 0.1651 0.0158 0.0055 0.0503 0.0749 0.0161 0.0204 0.0237 0.0095 Columns 11 to 12 0.0609 0.0148 [ CPUFloatType{1,12} ], Columns 1 to 10-1.3946 -0.0835 -1.1268 0.3325 -2.1884 4.6175 -0.1206 -1.5058 -1.5277 -2.1214 Columns 11 to 20 1.3726 -0.4573 -1.7583 -2.2275 1.9607 -5.3430 -4.4927 -3.2548 -5.3214 2.9002 Columns 21 to 30-1.3973 -0.8084 -1.8491 -1.6518 4.2531 -0.0321 -0.0282 -1.1180 -0.9800 2.9228 Columns 31 to 32 0.8228 2.2611 [ CPUFloatType{1,32} ]) Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 40.64. Iters per second: 24.6063 Memory usage before main runs: 71581696 bytes Memory usage after main runs: 94347264 bytes Peak memory usage after main runs: 94347264 bytes Average memory increase per iter: 2.22495e+06 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D32438852 fbshipit-source-id: 5defdc2593abda5da328f96248459d23b2c5e5c6	2021-11-19 17:08:07 -08:00
Pritam Damania	c41d8290b3	Rename shard_lengths to shard_sizes to be more inline with Tensor sizes. (#66464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66464 Dimension sizes are referred to as `size` in general in PyTorch and hence rename shard_lengths to shard_sizes. #Closes: https://github.com/pytorch/pytorch/issues/65794 ghstack-source-id: 143866449 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D31564153 fbshipit-source-id: 6273426c4b0e079358806070d0d9644740adb257	2021-11-19 16:30:00 -08:00
Pearu Peterson	af564e73b8	Strided masked log_softmax. (#68461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68461 Test Plan: Imported from OSS Reviewed By: dagitses, zou3519 Differential Revision: D32569961 Pulled By: cpuhrsch fbshipit-source-id: 5d262adacf239dace4a28de85af4b602e36f17f0	2021-11-19 16:28:35 -08:00
Peter Bell	578507cb7b	Fix nanmedian result using more CUDA memory than necessary (#68591 ) Summary: CUDA's `at::nanmedian` creates a sorted copy of the array, then indexes into it to create a single element view. This view necessarily keeps the entire `sorted` tensor's storage alive which can be avoided by returning a copy, which is what `at::median` does indirectly via `at::where`. This also changes the index variable `k` to be a simple `int64_t` instead of the CUDA tensor that was used before. This saves the additional host and device operations from calling `Tensor`'s `operator -` which helps balance out the cost of the `clone` added here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68591 Reviewed By: dagitses Differential Revision: D32538538 Pulled By: ngimel fbshipit-source-id: abe9888f80cf9d24d50a83da756e649af1f6ea3b	2021-11-19 16:16:19 -08:00
Shiyan Deng	6cca14d02f	[fx2trt][easy] Replace all network.add_activation() call with helper function (#68676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68676 As the title, the helper functions handles setting layer name. We would want to use those helper functions whenever possible. Test Plan: CI Reviewed By: wushirong Differential Revision: D32571061 fbshipit-source-id: 4a191f0085c0b3965dc02d99bb33de21973d565d	2021-11-19 15:29:39 -08:00
Aliaksandr Ivanou	37edb7483a	[torchelastic][1/n] Fix `caffe2.test.distributed.launcher.api_test` flaky tests (#68624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68624 Fix `caffe2.test.distributed.launcher.api_test` flaky tests for opt-tsan mode. The diff changes the default `mp.Process` invocation to use spawn context. `mp.Process` will uses `fork` method that is not compatible with `*san`. Test Plan: CI Reviewed By: d4l3k Differential Revision: D32550578 fbshipit-source-id: f4767987e8e10a6a2ece3f86e48278f2dbaebe7c	2021-11-19 15:23:30 -08:00
Jerry Zhang	a545a409f8	[quant][graphmode][fx] Support input_quantized_idxs and output_quantized_idxs in the new convert (#68042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68042 att Also added test cases from TestQuantizeFx which tests all combinations of {fp32, int8} input and output override Test Plan: ``` python test/fx2trt/test_quant_trt.py TestConvertFxDoNotUse ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32271511 fbshipit-source-id: 87ffc00069aaff7d1c455cdd97fac82b11aa4527	2021-11-19 15:12:54 -08:00
Peter Bell	993b7a2052	Remove doubly nested anonymous namespace (#68555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68555 The outer namespace is already anonymous, so this is not necessary. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32565941 Pulled By: malfet fbshipit-source-id: 4daf1c46b25ff68e748e6c834c63d759ec6fde4f	2021-11-19 14:40:47 -08:00
soulitzer	5456d8c8f3	Add vectorized Jacobian and Hessian computation with forward AD (#67041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67041 Original PR here: https://github.com/pytorch/pytorch/pull/62246 (The old PR does more things, but now that's split across this stack) This PR: - Adds "jacfwd" and "hessian_fwdrev" - Modifies existing tests to also test the `forward_ad=True` case Test Plan: Imported from OSS Reviewed By: gchanan, zou3519 Differential Revision: D32314424 Pulled By: soulitzer fbshipit-source-id: 785b0e39162b93dc3b3cb9413233447152eddd53	2021-11-19 14:31:09 -08:00
soulitzer	7bb401a4c9	Add forward AD support for miscellanous operators (#67820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820 Original PR here: https://github.com/pytorch/pytorch/pull/67040 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32314423 Pulled By: soulitzer fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b	2021-11-19 14:31:06 -08:00
soulitzer	e358c49a5b	Add OpInfo test and fix a couple cases (#66294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66294 In this PR: - OpInfo for forward AD now checks batched forward grad when `op.check_batched_grad=True` - Adds setting to disable the test for individual ops `check_batched_forward_grad` and disable for the ops here: https://github.com/pytorch/pytorch/issues/66357 Fixes some more failures: - Make Forward AD metadata less strict by allowing stride to differ when size is 1 - Fix sum batching rule when logical tensor is a scalar and dim is unspecified - Batching rule for `_reshape_alias` - ~Batching rules now preserve storage offset for view operator that return non-zero storage offset~ (moved to previous PR) Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842020 Pulled By: soulitzer fbshipit-source-id: 3517a8fb9d6291fccb53c0b1631eab5bbb24ebd1	2021-11-19 14:31:03 -08:00
soulitzer	21d203b5ca	Add internal assert for tangent layout mismatch for view ops (#66293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66293 This PR: - Asserts that if the output is a view, then the `is_same_metadata` must return `true`. Otherwise, we are performing a copy. - unless we are being called from `make_dual` which can allow the tangent and primal to have different layouts, because it is not forward differentiable. - To make this possible, we add `is_make_dual` as a parameter. ~The alternative is to make `make_dual` non-composite, and then we can rely on its `view_info` for differentiability information. This also assumes that the only composite function that calls `set_fw_grad` is `make_dual`.~ - Batching rules now preserve storage offset for view operator that return non-zero storage offset Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842021 Pulled By: soulitzer fbshipit-source-id: ed606f5a7b4770df1e9ebc6eb1d584b27dad5bae	2021-11-19 14:30:59 -08:00
soulitzer	2455cc2adf	Address case when layout of tangent is not same as base (#66292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292 In this PR: 1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule. For (1) we replace the following: ``` Tensor new_with_same_meta(const Variable& base) { int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize(); auto new_tensor = at::zeros({nelement_in_storage}, base.options()); auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset()); return res; } ``` with a native function as to enable a batching rule to alter its behavior. This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments. Possible concerns: - Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API. - Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent. - Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts. - Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor. Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842019 Pulled By: soulitzer fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2	2021-11-19 14:29:46 -08:00
Andrey Talman	bbe2aae84c	Support cuda 11.5: install magma for cuda in conda (#68665 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68665 Reviewed By: malfet Differential Revision: D32570283 Pulled By: atalman fbshipit-source-id: 4471fe8c4f8cc74c542ed67038322f07e861af73	2021-11-19 13:43:26 -08:00
Rohan Varma	183dcdf551	[reland] Fix flaky test_nccl_timeout (#68544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 In addition to changes in https://github.com/pytorch/pytorch/pull/68403, add one more error check that can be raised when a collective times out cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68544 Reviewed By: albanD Differential Revision: D32508706 Pulled By: rohan-varma fbshipit-source-id: 7d41b91f547d4ad763c44cd11e7b9914b452b617	2021-11-19 13:25:24 -08:00
Jerry Zhang	875ba3dddb	[quant][trt] Add support for torch.addmm in TensorRT (#67537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67537 This PR adds support for quantizing torch.addmm to produce a reference quantized pattern, and also adds support in the backend_config_dict api that allows people to specify the input, weight and bias input for each input: ``` addmm_config = { "pattern": torch.addmm, "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, "dtype_configs": [ weighted_op_qint8_dtype_config, ], # a map from input type to input index "input_type_to_index": { "bias": 0, "input": 1, "weight": 2, } } ``` This requires some changes in getting weight_dtype and bias_dtype in the type inference stage of prepare, which will be added in the previous PR Test Plan: ``` pytho test/fx2trt/test_quant_trt.py TestQuantizeFxTRT.test_addmm ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32014998 fbshipit-source-id: 8d96c1e8b7ebb2ab385c08a5b1e43f2d5a2cbcbe	2021-11-19 13:19:28 -08:00
Mike Iovine	ee4cfaa286	[SR] Add utility class to determine tensor ranges (#68284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68284 Add a new class `ManagedTensorRanges` that determines when manage tensors can be made available for re-use. This class provides a method `availableTensors(Node* node)` that returns a vector of `Value*` (corresponding to managed tensors) that are not used (either directly or through any alias) after `node`. Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: swolchok Differential Revision: D32397207 fbshipit-source-id: fb0d9a23f13abf6f2207e3d7266384966f477fc6	2021-11-19 13:10:55 -08:00
Jerry Zhang	a6d862c50a	[quant][graphmode][fx] Add support for weight and bias dtype in backend_config_dict (#68602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602 This PR adds support for configuring weight/bias dtype in backend_config_dict and refactor the current code that checks when to insert observers Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537712 fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2	2021-11-19 13:01:50 -08:00
Jagadish Krishnamoorthy	da4a95c79a	[ROCm] Use hipCUB/rocPRIM scan algorithms for large index support (#68487 ) Summary: For inclusive_scan and exclusive_scan, use hipCUB/rocPRIM scan algorithms for large index support. Implemented for ROCm 5.0 and above. Code reference : ROCmSoftwarePlatform/rocPRIM@5673df4#diff-47f4ef75e5af60dd5fe3906df9cf971f0635602a6b64a706dee6633d6677ef1a Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/68487 Reviewed By: ngimel Differential Revision: D32547541 Pulled By: malfet fbshipit-source-id: 4dd984e6906aec7634d05e2ceaa55e31cd4d7376	2021-11-19 12:51:30 -08:00
Shirong Wu	5880a2f1ef	Allow fuse unsqueeze cat sum with multiple input (#68650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68650 Allow fuse unsqueeze cat sum with >2 input, the impl in this diff is naive, just concat item with add. Not sure can have more perf gain with fuse multiple add into one operation. Test Plan: unit test Reviewed By: jfix71 Differential Revision: D32520135 fbshipit-source-id: 535b1c8c91e415d5f1af714378b9205c1ca02ffd	2021-11-19 12:45:37 -08:00
Pearu Peterson	2cab77f810	Masked normalization infrastructure and strided masked softmax (#68333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68333 Test Plan: Imported from OSS Reviewed By: dagitses, ZolotukhinM Differential Revision: D32564435 Pulled By: cpuhrsch fbshipit-source-id: 4d4662323ceffd12c210b7e931682d0442578157	2021-11-19 12:41:22 -08:00
Philip Meier	f99f5ee088	add support for None in assert_close (#67795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67795 Closes #61035. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32532207 Pulled By: mruberry fbshipit-source-id: 6a2b4245e0effce4ddea7d89eca63e3b163951a7	2021-11-19 12:38:25 -08:00
Philip Meier	0809553cf0	refactor assert_close to be more modular (#67794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67794 This change is needed to conveniently use the same comparison mechanism for our internal testsuite (see #67796). The reworked version is on par with the previous version except for the ability to pass a custom message as callable. Before we converted everything to a tensor so it was fairly easy to provide consistent mismatch diagnostics to the callable. Now, with arbitrary `Pair`'s that are used for comparison that is no longer viable. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32532206 Pulled By: mruberry fbshipit-source-id: dc847fba6a795c1766e01bc3e88b680a68287b1e	2021-11-19 12:37:16 -08:00
Ivan Kobzarev	f74779e403	[android] Lite interpreter naming for android nightly publishing (#68651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68651 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D32564796 Pulled By: IvanKobzarev fbshipit-source-id: 57847bfb2778433cfb02ad7a5a79ae30a6b438c1	2021-11-19 10:56:13 -08:00
Saketh Are	4bcff4733d	Add OpInfos for parcel Elementwise Binary II (#68085 ) Summary: Adds OpInfos for `torch.lcm`, `torch.gcd`, `torch.heaviside`, `torch.bitwise_or`, `torch.bitwise_xor`, `torch.isclose`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68085 Reviewed By: ngimel Differential Revision: D32533310 Pulled By: saketh-are fbshipit-source-id: 1616ebec61164cd1b44672f36220787a878b96a4	2021-11-19 10:37:07 -08:00
Ben Koopman	c2c859bdf2	[quant][embedding qat] Add benchmarks for QAT Embedding+EmbeddingBag (#66560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66560 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31618282 Pulled By: b-koopman fbshipit-source-id: ebfe723cfc4004f413f157e65532d64e8d0274b3	2021-11-19 06:29:19 -08:00
Gisle Dankel	f82f14de17	[libkineto] Refactor 4/n: Simplify activity logger step 2/3 (#68329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68329 Pull Request resolved: https://github.com/pytorch/kineto/pull/466 1. Generalize ChromeTraceLogger::handleGenericActivity to enable it to handle Cuda runtime activities as well as the Roctracer generic activities. This primarily involves enabling generic support for CPU -> GPU flows. 2. In the event of out-of-order GPU activities (an issue with Cuda11.0, likely fixed in later versions), no longer remove them but print warnings. Another diff will add these warnings to the metadata section. Reviewed By: briancoutinho Differential Revision: D31624496 fbshipit-source-id: dab04b3e3c0dd6799496ac87f837363de79eea25	2021-11-18 23:09:20 -08:00
Gisle Dankel	18312313c4	[Profiler] Add missing guards (#65812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65812 Multiple threads are recording events to a shared activity buffer and the buffer is at some point transferred to libkineto. The access to and the transfer of the buffer needs to be done under lock. Reviewed By: leitian, xw285cornell Differential Revision: D31220061 fbshipit-source-id: f11c879df1b55aa9068187e600730bb0e5e5455f	2021-11-18 22:39:21 -08:00
Scott Wolchok	343723e2ad	[PyTorch][JIT][easy] Delete unnecessary overload of MemoryDAG::mayAlias (#66966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66966 T* is convertible to const T*, so we don't need this overload. ghstack-source-id: 143749559 Test Plan: builds Reviewed By: hlu1 Differential Revision: D31809824 fbshipit-source-id: 70cca86c4a87dc09cd958953a08a801db3e4d047	2021-11-18 22:36:06 -08:00
Scott Wolchok	ced57eb490	[PyTorch][Static Runtime] Delete incorrect alias analysis code (#67075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67075 Sharing storage if `mayAlias` is incorrect, as the old comment notes; sharing if `mustAlias` would be nice but, as the new comment notes, would not matter. ghstack-source-id: 143749553 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31851893 fbshipit-source-id: 5bdc8de984d5919332c9010e8b0160211d96bc2f	2021-11-18 22:34:50 -08:00
Kushashwa Ravi Shrimali	833dcaf2d6	Sparse CSR: Add `torch.sin` (#68123 ) Summary: This PR attempts to add support for `torch.sin` for sparse CSR tensors. This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp) The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :) cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123 Reviewed By: jbschlosser Differential Revision: D32533379 Pulled By: cpuhrsch fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb	2021-11-18 21:58:09 -08:00
Tristan Rice	758d7dea9c	torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a	2021-11-18 21:46:23 -08:00
Jordan Fix	68d8ab0cc6	[const_fold] Fix call_module const folding (#68614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68614 We need to copy modules over to the `split` graph during const folding. We were previously only doing so from the non-constant submod, but we need to do this for the constant one as well in case some `call_module` is const folded. Test Plan: Added unit test Reviewed By: wushirong, 842974287 Differential Revision: D32543289 fbshipit-source-id: 80d1d0ce2c18a665b00e1343d6c55d939390ab10	2021-11-18 20:56:06 -08:00
Ivan Kobzarev	39747dc456	[nnc] Loweings for flatten, xnnpack prepack op (#68470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68470 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32545261 Pulled By: IvanKobzarev fbshipit-source-id: b2bf5b3260002bcc40a351a9c56d786b16b69287	2021-11-18 20:14:42 -08:00
jiej	ca92111758	Add native_dropout (#63937 ) Summary: Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937 Reviewed By: mruberry Differential Revision: D32477657 Pulled By: ngimel fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4	2021-11-18 19:41:10 -08:00
Shunting Zhang	a39060c001	textray demo for unity Summary: Previously I need to back out D32220626 and then apply D31841609 to run the textray unity demo. It's hard to have other people to take a look how this textray demo looks like. I copied the textray demo (a single file) from pytext folder to unity folder and applied the changes needed. This way, other people can also run this textray demo. This also makes my dev environment cleaner. Test Plan: buck run mode/opt :textray_demo Reviewed By: mleshen Differential Revision: D32537190 fbshipit-source-id: 5df6347c4bec583c225aea9f98fbc9f37b5d3153	2021-11-18 19:04:18 -08:00
Vansh Sharma	ff125a3624	Minor changes in documentation (#68557 ) Summary: Fixed some small typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557 Reviewed By: mruberry Differential Revision: D32538749 Pulled By: ngimel fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3	2021-11-18 17:57:16 -08:00
Masaki Kozuki	9ce3c630ba	[Docs] Mention `torch.bfloat16` in `torch.finfo` (#68496 ) Summary: https://pytorch.org/docs/master/type_info.html#torch.torch.finfo seems to miss `torch.bfloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68496 Reviewed By: mruberry Differential Revision: D32538806 Pulled By: ngimel fbshipit-source-id: 1296b3eb34d024cfc7d85cf53efe771ee9f98ea2	2021-11-18 17:52:41 -08:00
soulitzer	913ac27112	Fixes forward AD codegen for multiple formulas (#68535 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67367 - Adds check to make sure forward grad itself does not have forward grad at the same level - Verify with `python test/test_ops.py -k test_forward_mode_AD_linalg_eigh_cpu_float64` that it fails the check before, but passes after the codegen update Before: ``` if (_any_has_forward_grad_eigenvalues) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto eigenvalues_new_fw_grad = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors); if (eigenvalues_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvalues._set_fw_grad(eigenvalues_new_fw_grad, /* level / 0, / is_inplace_op / false); } } if (_any_has_forward_grad_eigenvectors) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto eigenvectors_new_fw_grad = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors); if (eigenvectors_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvectors._set_fw_grad(eigenvectors_new_fw_grad, / level / 0, / is_inplace_op / false); } } ``` After: ``` c10::optional<at::Tensor> eigenvalues_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_eigenvalues) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); eigenvalues_new_fw_grad_opt = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors); } c10::optional<at::Tensor> eigenvectors_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_eigenvectors) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); eigenvectors_new_fw_grad_opt = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors); } if (eigenvalues_new_fw_grad_opt.has_value() && eigenvalues_new_fw_grad_opt.value().defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvalues._set_fw_grad(eigenvalues_new_fw_grad_opt.value(), / level / 0, / is_inplace_op / false); } if (eigenvectors_new_fw_grad_opt.has_value() && eigenvectors_new_fw_grad_opt.value().defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvectors._set_fw_grad(eigenvectors_new_fw_grad_opt.value(), / level / 0, / is_inplace_op */ false); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68535 Reviewed By: ngimel Differential Revision: D32536089 Pulled By: soulitzer fbshipit-source-id: a3f288540e2d78a4a9ec4bd66d2c0f0e65dd72cd	2021-11-18 17:44:17 -08:00
Ivan Kobzarev	e7002c62ae	[nnc] External functions quantized via Dispatch (#68572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68572 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32522410 Pulled By: IvanKobzarev fbshipit-source-id: 7bb373819275582bb02e0d2ffd87a78d19f92318	2021-11-18 17:27:03 -08:00
Aliaksandr Ivanou	a990a7ac31	[torchelastic] Remove stale `test_get_default_executable` test (#68609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68609 The test is stale and tests non-existent method Test Plan: ci Reviewed By: kiukchung Differential Revision: D32540127 fbshipit-source-id: c47b7aed3df6947819efb2f4ad1b7a059c252138	2021-11-18 17:20:36 -08:00
Wanchao Liang	003f6ccec6	[BE] rename some tests in test_c10d_common (#67828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67828 as titled ghstack-source-id: 143781976 Test Plan: wait for ci Reviewed By: mrshenli Differential Revision: D32165576 fbshipit-source-id: 40c04b74f9e3241d3b3d64dee53af01fcfd1018b	2021-11-18 17:14:58 -08:00
John Clow	3757a16c7a	Adding custom testing based on opinfos input for ops with custom rules. (#67500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500 * #66898 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497547 Pulled By: Gamrix fbshipit-source-id: 07761f0e27f4ac289377ff3279ce6470d4b727dd	2021-11-18 16:29:00 -08:00
John Clow	71a031e954	Adding Custom Rules to Device Propagation (#66973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497549 Pulled By: Gamrix fbshipit-source-id: 5732682c0b39709f76cf218490e5f5136c0d83f8	2021-11-18 16:28:56 -08:00
John Clow	77db720c65	Moving parts of the Shape Registry into a common file (#66948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497550 Pulled By: Gamrix fbshipit-source-id: 650feded6bae379af3d73a52edac2721bd7af2f2	2021-11-18 16:27:45 -08:00
Hongyi Jia	244691db98	surface ncclUniqueId store broadcast error (#68597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68597 Users got confused by just 'Socket timeout'. Surfacing detailed error message. https://fb.workplace.com/groups/319878845696681/posts/650320792652483/. As we are using store more often for desync timeout/slowness detection, will need a good wrapper to surface error message for all store APIs. Test Plan: ``` RuntimeError: [3] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got exception: Socket Timeout Exception raised from recvBytes at caffe2/torch/csrc/distributed/c10d/Utils.hpp:595 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const, char const, unsigned int, char const) # 4 c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::chrono::duration<long, std::ratio<1l, 1000l> >) # 5 c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 6 c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 7 c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 8 c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId, c10d::OpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) # 9 c10d::ProcessGroupNCCL::getNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<c10::Device, std::allocator<c10::Device> > const&, c10d::OpType, int, bool) # 10 c10d::ProcessGroupNCCL::allreduce(std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&) # 11 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::ProcessGroup::Work, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup::Work> >, c10d::ProcessGroup, std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::ProcessGroup::WorkTraceback (most recent call last): ``` Reviewed By: rohan-varma, mingzhe09088 Differential Revision: D32533304 fbshipit-source-id: e471636ee0c5291215cb6cde659b10bee13b7d12	2021-11-18 16:04:39 -08:00
Nikolay Korovaiko	ab1d879b33	[WIP] forbid aliasing between the outputs of a differentiable graph (#67732 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732 Reviewed By: cpuhrsch Differential Revision: D32522826 Pulled By: Krovatkin fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12	2021-11-18 15:03:35 -08:00
Jane Xu	9f4e004abd	Revert D32283178: Add linalg.solve_triangular Test Plan: revert-hammer Differential Revision: D32283178 (`0706607abc`) Original commit changeset: deb672e6e52f fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755	2021-11-18 14:46:10 -08:00
Pearu Peterson	48771d1c7f	[BC-breaking] Change dtype of softmax to support TorchScript and MyPy (#68336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68336 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32470965 Pulled By: cpuhrsch fbshipit-source-id: 254b62db155321e6a139bda9600722c948f946d3	2021-11-18 11:26:14 -08:00
Alban Desmaison	748d9d2494	Revert D32187063: [static runtime] dequantize out variant Test Plan: revert-hammer Differential Revision: D32187063 (`f120335643`) Original commit changeset: 1fec6b74c7d3 fbshipit-source-id: 9770f8379e9ddba9e537fef0e66cc93c2caaf860	2021-11-18 10:12:31 -08:00
lezcano	0706607abc	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: zou3519, JacobSzwejbka Differential Revision: D32283178 Pulled By: mruberry fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8	2021-11-18 09:45:51 -08:00
Ansha Yu	f120335643	[static runtime] dequantize out variant (#67873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67873 Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Reviewed By: hlu1 Differential Revision: D32187063 fbshipit-source-id: 1fec6b74c7d3f25d0f445775c4558d30c55dcece	2021-11-18 09:31:27 -08:00
Shirong Wu	7d38768d84	Rename splitter result (#68303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68303 Result of splitter is run on either accelerator or directly on gpu, rename gpu part graph to run_on_gpu Test Plan: buck test mode/opt caffe2/test:trt_tools_test Reviewed By: 842974287 Differential Revision: D32392492 fbshipit-source-id: b085376c00c1097752e856e22c631d74a0fbc38f	2021-11-18 09:04:30 -08:00
Emilio Castillo	533e72e0a4	Fix DLPack CUDA stream convention (#67618 ) Summary: Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1: https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none. This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926. cc rgommers leofang mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618 Reviewed By: albanD Differential Revision: D32521805 Pulled By: mruberry fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d	2021-11-18 08:36:05 -08:00
kshitij12345	d5d2096dab	[testing] make @dtypes mandatory when using @dtypesIf (#68186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53647 With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised ``` AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it ``` Tested Locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186 Reviewed By: VitalyFedyunin Differential Revision: D32468581 Pulled By: mruberry fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b	2021-11-18 08:29:31 -08:00
Nikita Vedeneev	857fed1f42	torch.linalg.qr: forward AD support (#67268 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67268 Reviewed By: ngimel Differential Revision: D31960517 Pulled By: albanD fbshipit-source-id: bfd1028a8d352f550efb420f9ca609c09f4a7484	2021-11-18 08:11:54 -08:00
Nikita Shulga	a2d187a672	[BE] MapAllocator: report map error on Linux (#68545 ) Summary: Add `, strerror(errno), " (", errno, ")"` suffix to TORCH_CHECK messages that report failures from POSIX calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/68545 Reviewed By: ngimel Differential Revision: D32509300 Pulled By: malfet fbshipit-source-id: 1d7792d07e3a1184d2d54d137e6a9105dbab7d4c	2021-11-18 08:04:09 -08:00
Richard Zou	b1aa45a8a7	Fix `_make_wrapper_subclass`'s storage_offset handling (#68268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68268 Previously, `_make_wrapper_subclass` ignored the storage offset it was passed. This PR fixes that by updating TensorMaker::computeStorageSize() and TensorMaker::make_tensor() to take into account storage_offset. Test Plan: - added test Reviewed By: albanD, bdhirsh Differential Revision: D32396330 Pulled By: zou3519 fbshipit-source-id: 2c85bc4066044fe6cb5ab0fc192de6c9069855fd	2021-11-18 07:07:42 -08:00
Richard Zou	f0e2ad5037	Stop warning spamming about vmap in gradcheck (#68586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68586 We updated the vmap warnings to be more descriptive in https://github.com/pytorch/pytorch/pull/67347 . However, gradcheck does some warning squashing that matches on the warning message and we didn't update that. This PR updates the warning squashing in gradcheck. Test Plan: - check logs Reviewed By: albanD Differential Revision: D32530259 Pulled By: zou3519 fbshipit-source-id: 9db380b57c38b3b72cbdb29574f71dbfe71e90d1	2021-11-18 07:00:36 -08:00
Richard Zou	f9ef807f4d	Replace empty with new_empty in nn.functional.pad (#68565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68565 This makes it so that we can now vmap over nn.functional.pad (circular variant). Previously we could not because we were effectively doing `out.copy_(input)` where the out was created with empty. This also has the added side effect of cleaning up the code. Test Plan: - I tested this using functorch.vmap and can confirm that vmap now works. - Unfortunately this doesn't work with the vmap in core so I cannot add a test for this here. Reviewed By: albanD Differential Revision: D32520188 Pulled By: zou3519 fbshipit-source-id: 780a7e8207d7c45fcba645730a5803733ebfd7be	2021-11-18 06:06:50 -08:00
Ben Koopman	6c9cf5e6ea	[quant][embedding qat] eager mode QAT for Embeddings (#66429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66429 Test Plan: Imported from OSS Reviewed By: HDCharles, supriyar Differential Revision: D31618284 Pulled By: b-koopman fbshipit-source-id: 0c0e2e86b98da9f29e9b2fc2a35c59424f94cbba	2021-11-18 05:57:11 -08:00
Jaewon Lee	dbbb02474b	[GPU host alloc] Fast path for size 0 malloc (#68532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68532 Diff to better handle size 0 pinned memory allocation requests. ---- ### Behavior before fix The very first size 0 malloc comes in. It will create a block with `{key: 0, value: Block(0, 0, true)}`. Another size 0 malloc comes in. It will either 1) get a block with size > 0 (which is a waste of pinned memory) or 2) call `cudaHostAlloc()` with size 0 to eventually get ptr=0. Note that this block is not registered* to the block pool because we have a duplicate entry (and that's why we will keep wasting size > 0 pinned memory block, if `available.empty() == false`). ---- ### Behavior after fix Let `malloc()` simply return a nullptr (0). This avoids wasting valid size > 0 blocks as well as save the calls to `cudaHostAlloc()` which is expensive. This is also safe since `free()` simply returns success for nullptrs. ----- Test Plan: Unit tests. Reviewed By: yinghai Differential Revision: D32487522 fbshipit-source-id: 6140cab54ff5a34ace7d046f218fb32805c692c0	2021-11-18 02:39:36 -08:00
Ansha Yu	4635f5711f	[static runtime][dper] multi_env tests for static runtime: selective enable (#67467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67467 Unit tests for static runtime in the dper multi-env tests for cpu and scripted (including fx-traced + scripted) models. Only turn it on for single_operators_tests that are in the inline_cvr local/local_ro/remote_ro model for now. Will have another diff that turns this on by default and explicitly disables for certain tests. Test Plan: buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test Reviewed By: hlu1, houseroad Differential Revision: D30870488 fbshipit-source-id: 382daec8dbcb95135cdd43e7b84a1d23b445d27c	2021-11-18 01:04:12 -08:00
Wanchao Liang	35712a8eb4	[reland] simplify init_from_local_shards API (#68021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68021 reland PR of https://github.com/pytorch/pytorch/pull/64481 as the previous one have some internal failures that didn't get captured when first landed. This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 143661119 ghstack-source-id: 143661119 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D32147888 fbshipit-source-id: 897128b75224f4b9644471a04a64079f51e0d5fe	2021-11-17 23:20:37 -08:00
Rok	952ca25daa	Sparse CSR: add `convert_indices_from_csr_to_coo` (#66774 ) Summary: This PR adds conversion from CSR to COO. Fixes https://github.com/pytorch/pytorch/issues/56959 cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774 Reviewed By: zou3519 Differential Revision: D32288415 Pulled By: cpuhrsch fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968	2021-11-17 22:28:30 -08:00
Hongyi Jia	96ba2099d1	Fix c10d TCP store with mutex (#68499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68499 TCP store is actually being accessed by multi-threading (NCCL watch dog thread), but no mutex protection while FileStore and HashStore have. As enabling desync root cause analysis makes store calls more often, the race condition of TCP store was always triggered when creating another process group like gloo. Adding mutex to TCP store, to be the same with FileStore and HashStore. Test Plan: DDP benchmark with desync debug enabled, no perf regression https://www.internalfb.com/intern/fblearner/details/309398285?tab=Outputs W/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32482254 fbshipit-source-id: e8f466e1c6fdcab6cfa170f44b9be70395935fb8	2021-11-17 20:30:10 -08:00
Hongyi Jia	146a7f68e2	Enable desync root cause analysis for NCCL (#68310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310 Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling. Test Plan: Standalone test * Typical desync - P467288969 * Mismatched collectives - P467288916 * Mismatched broadcast size - P467288873 DDP benchmark * DDP benchmark desync - P467433483, P467520195 No perf regression: * w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs * w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32348647 fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a	2021-11-17 20:29:03 -08:00
rusty1s	9807787135	`scatter_reduce` (#68115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63780 Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`: * `scatter_reduce` is named as `scatter_reduce2` due to compiling issues * It currently re-uses functionality from `scatter_add` * Tests are missing: WIP The error when the `scatter_reduce` naming is used: ``` In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’ 13949 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’ 13817 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’ 13960 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’ 13839 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ In file included from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’: aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 3976 \| return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7119 \| return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7124 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’: aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7129 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from aten/src/ATen/NativeFunctions.h:6, from ../aten/src/ATen/TensorIndexing.h:12, from ../aten/src/ATen/ATen.h:20, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/NativeMetaFunctions.h: At global scope: aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’ 496 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’ 481 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115 Reviewed By: albanD Differential Revision: D32488450 Pulled By: cpuhrsch fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722	2021-11-17 19:53:12 -08:00
Shiyan Deng	e72b9db48e	[fx2trt] add converter for acc_ops.hardtanh (#68550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68550 Missing ops in https://fburl.com/gsheet/q06f1vrc Test Plan: unit tests Reviewed By: wushirong Differential Revision: D32500303 fbshipit-source-id: 9266210ae229263f6bb2a60486c279ceb766ffdf	2021-11-17 17:59:37 -08:00
Kefei Lu	9d9ca88f5c	[predictor][trt] Expose more CUDA/CuDNN info to at::Context and BC stage 1 (#68146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68146 Expose more CUDA/CuDNN info to at::Context Test Plan: CI; lint; Reviewed By: houseroad Differential Revision: D32264935 fbshipit-source-id: ad43d5d245dba4a054e09346240414159832585e	2021-11-17 17:16:19 -08:00
Ivan Kobzarev	d71092f668	[android][fbjni] Update fbjni to 0.2.2 (#68400 ) Summary: ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400 Fixes #{issue number} CI-all check: https://github.com/pytorch/pytorch/pull/68497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68495 Reviewed By: linbinyu Differential Revision: D32481451 Pulled By: IvanKobzarev fbshipit-source-id: b19ce05ff9d63b3f701d718eefbf1e9d66e11639	2021-11-17 16:54:22 -08:00
Brian Hirsh	53bfb00ee1	[bugfix] TensorList args in functionalization pass (#68395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68395 At the time that I wrote the pass, I thought that `c10::TensorList` and `c10::List<Tensor>` were the same thing. But it looks like a `TensorList` is actually an `ArrayRef<Tensor>`. This led to a nasty bug when I tried to add conditional functionalization to `block_diag`, where in the boxed kernel, I would: (1) unwrap the first `IValue` by calling `.toTensorList()` (this actually returns a `List<Tensor>`, not a `TensorList`). (2) call `TensorList to_functional_tensor(List<Tensor>)` to get out a `TensorList` with the functionalized tensors (3) wrap that back into an `IValue` and put in on the stack. Somewhere in that sequence of operations, something bad happens and we segfault. Fixing up the signature of `to_functional_tensor` to be `List<Tensor> to_functional_tensor(List<Tensor>)` fixes the bug. I have a feeling that there's a latent TensorList-related bug in the boxing/unboxing logic that made this worse, but I'm okay to stick with my narrow fix for now. Additionally tested by running `pytest test/test_ops.py test/test_vmap.py -v -k block_diag` on top of this PR: https://github.com/pytorch/functorch/pull/235 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32448258 Pulled By: bdhirsh fbshipit-source-id: 3b2b6c7cd5e4c29533d0502f24272d826bfe03c1	2021-11-17 15:50:30 -08:00
Wei-Sheng Chin	b0bdf588ea	[ONNX] Release values cached in global object (#68210 ) Summary: To release constants computed and stored by `ConstantValueMap::SetValue(...)` during ONNX exporting, `ConstantValueMap::Clear()` needs to be called explicitly. Otherwise, it's a memory leak. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68210 Reviewed By: jansel Differential Revision: D32465670 Pulled By: msaroufim fbshipit-source-id: 521e474071b94c5d2cd4f353ee062cee78be1bd4	2021-11-17 12:47:59 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
Natalia Gimelshein	e2aeb4a7af	Improve native layer norm backward perf (#68238 ) Summary: Benchmarks At this PR ``` [------------------------------------------------------ ln ------------------------------------------------------] \| fwd, torch.float32 \| fwdbwd, torch.float32 \| fwd, torch.float16 \| fwdbwd, torch.float16 1 threads: ------------------------------------------------------------------------------------------------------- 200, 256 \| 17.5 \| 106.6 \| 18.1 \| 94.7 1000, 256 \| 18.7 \| 116.6 \| 18.7 \| 110.7 6000, 256 \| 28.1 \| 111.8 \| 19.4 \| 92.3 6272, 256 \| 29.3 \| 108.5 \| 20.1 \| 92.7 200, 512 \| 19.3 \| 83.8 \| 19.1 \| 116.3 1000, 512 \| 17.9 \| 88.0 \| 17.9 \| 93.0 6000, 512 \| 36.9 \| 141.2 \| 27.4 \| 103.3 6272, 512 \| 38.2 \| 146.5 \| 28.1 \| 107.9 200, 1024 \| 18.1 \| 89.5 \| 21.1 \| 102.7 1000, 1024 \| 17.9 \| 88.7 \| 18.5 \| 92.5 6000, 1024 \| 77.6 \| 277.5 \| 40.3 \| 148.5 6272, 1024 \| 80.7 \| 288.1 \| 42.0 \| 154.0 200, 1536 \| 17.9 \| 117.3 \| 18.1 \| 88.1 1000, 1536 \| 22.9 \| 92.0 \| 19.4 \| 89.0 6000, 1536 \| 123.4 \| 436.3 \| 61.7 \| 228.5 6272, 1536 \| 129.1 \| 457.3 \| 64.3 \| 238.5 200, 2048 \| 18.0 \| 90.5 \| 19.1 \| 101.6 1000, 2048 \| 31.1 \| 109.8 \| 25.3 \| 107.9 6000, 2048 \| 174.5 \| 589.8 \| 87.1 \| 310.5 6272, 2048 \| 182.2 \| 617.0 \| 91.2 \| 316.7 200, 3072 \| 19.8 \| 96.4 \| 19.4 \| 89.3 1000, 3072 \| 48.1 \| 168.7 \| 23.5 \| 100.9 6000, 3072 \| 267.1 \| 930.0 \| 134.8 \| 519.2 6272, 3072 \| 278.2 \| 971.2 \| 140.7 \| 540.2 ``` Pre-https://github.com/pytorch/pytorch/issues/67977 ``` [------------------------------------------------------- ln -------------------------------------------------------] \| fwd, torch.float32 \| fwdbwd, torch.float32 \| fwd, torch.float16 \| fwdbwd, torch.float16 1 threads: --------------------------------------------------------------------------------------------------------- 200, 256 \| 20.9 \| 92.6 \| 21.3 \| 110.1 1000, 256 \| 20.3 \| 91.8 \| 28.1 \| 115.6 6000, 256 \| 93.0 \| 310.7 \| 86.3 \| 299.8 6272, 256 \| 97.3 \| 323.5 \| 90.0 \| 314.1 200, 512 \| 20.9 \| 110.2 \| 21.1 \| 95.0 1000, 512 \| 24.0 \| 102.8 \| 22.2 \| 95.9 6000, 512 \| 121.7 \| 367.2 \| 105.6 \| 337.4 6272, 512 \| 127.0 \| 382.3 \| 111.3 \| 352.0 200, 1024 \| 21.0 \| 131.8 \| 20.4 \| 93.3 1000, 1024 \| 35.5 \| 108.7 \| 27.7 \| 99.4 6000, 1024 \| 170.4 \| 495.5 \| 137.7 \| 411.4 6272, 1024 \| 177.5 \| 517.6 \| 143.6 \| 428.6 200, 1536 \| 21.9 \| 97.6 \| 20.8 \| 92.7 1000, 1536 \| 44.3 \| 129.7 \| 33.9 \| 100.1 6000, 1536 \| 215.8 \| 619.2 \| 167.2 \| 480.9 6272, 1536 \| 225.0 \| 646.9 \| 174.8 \| 505.9 200, 2048 \| 21.8 \| 100.8 \| 20.7 \| 96.7 1000, 2048 \| 53.7 \| 152.4 \| 41.4 \| 118.3 6000, 2048 \| 267.0 \| 753.6 \| 220.4 \| 571.5 6272, 2048 \| 278.6 \| 785.8 \| 211.4 \| 589.2 200, 3072 \| 20.9 \| 103.7 \| 21.9 \| 104.6 1000, 3072 \| 71.4 \| 201.1 \| 53.1 \| 148.3 6000, 3072 \| 365.7 \| 1040.3 \| 262.0 \| 731.5 6272, 3072 \| 382.0 \| 1084.4 \| 273.3 \| 766.3 ``` Benchmarking script ``` import torch from torch.utils.benchmark import Timer, Compare results = [] for dtype in (torch.float, torch.half): for fs in (256, 512, 1024, 1536, 2048, 3072): for bs in (200, 1000, 6000, 196*32): ln = torch.nn.LayerNorm((fs,), device="cuda", dtype=dtype) X = torch.randn(bs, fs, device="cuda", dtype=dtype, requires_grad=True) gO = torch.rand_like(X) stmtfwd = "ln(X)" stmtfwdbwd = "X.grad=None; ln.zero_grad(set_to_none=True); out = ln(X); out.backward(gO)" tfwd = Timer(stmt=stmtfwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwd, {dtype}", globals=globals()) tfwdbwd = Timer(stmt=stmtfwdbwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwdbwd, {dtype}", globals=globals()) for t in (tfwd, tfwdbwd): results.append(t.blocked_autorange()) print(fs, end='\r') c = Compare(results) c.print() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68238 Reviewed By: mruberry Differential Revision: D32469450 Pulled By: ngimel fbshipit-source-id: 08fe755c156d3d5c366c966cb808bf0f3e74c050	2021-11-17 12:00:07 -08:00
Jane Xu	f3e2fefe09	Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486 ) Summary: After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable. What a noob mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486 Reviewed By: seemethere Differential Revision: D32501718 Pulled By: janeyx99 fbshipit-source-id: 9918d626e90bea1562a3094c6eb12cb7d86dbf6a	2021-11-17 11:50:48 -08:00
Jerry Zhang	2f37a39a5c	[quant][graphmode][fx] Refactor node_name_to_target_dtype to make it more clear (#68317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68317 We use the node_name_to_target_dtype to store the target dtype for output activations for each node, computed from qconfig for the node, there are two problems with node_name_to_target_dtype that makes it hard to work with: 1. we mutate node_name_to_target_dtype when we insert observers, this makes the data structure confusing because it's typically unexpected to change a data structure that store the "target" dtype 2. currently it only stores target dtype about output activations, while we also need target dtype for input activation, weight and bias This PR fixes both problem by removing mutation from the node_name_to_target_dtype and expanding the target_dtype for node to include the missing target dtype for input activation, weight and bias. We will have another refactor to simplify the observation for weight and bias dtype in the future. Please see comments for the updated structure of node_name_to_target_dtype TODO: we may want to rename node_name_to_target_dtype to node_name_to_target_dtype_info in a separate PR. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32411858 fbshipit-source-id: 3d76dd65056920ff8642899517bc1b95d43fc1de	2021-11-17 11:21:25 -08:00
Kurt Mohler	3b4f072383	Remove TH/THC Storage data and copy functions (#68127 ) Summary: Part of https://github.com/pytorch/pytorch/issues/67852 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/68127 Reviewed By: mrshenli Differential Revision: D32441885 Pulled By: ngimel fbshipit-source-id: 1bbe7c8bed30bfe1737511a4f347fd9a8024dd99	2021-11-17 11:19:54 -08:00
Peter Bell	4e21d77dbb	Use TORCH_CHECK in MapAllocator (#68424 ) Summary: When porting `THAllocator` to ATen I changed `AT_ERROR` to `TORCH_INTERNAL_ASSERT` but the direct translation should have been `TORCH_CHECK`. `33e9a0b5f6/c10/util/Exception.h (L619-L623)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68424 Reviewed By: VitalyFedyunin Differential Revision: D32465548 Pulled By: ngimel fbshipit-source-id: 7fa9c1fe27e4849b76248badb681d7b6877ce9e8	2021-11-17 10:33:22 -08:00
frgfm	693fe2fd9b	docs: Added Union to supported types in documentation (#68435 ) Summary: This PR simply updates the documentation following up on https://github.com/pytorch/pytorch/pull/64234, by adding `Union` as a supported type. Any feedback is welcome! cc ansley albanD gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/68435 Reviewed By: davidberard98 Differential Revision: D32494271 Pulled By: ansley fbshipit-source-id: c3e4806d8632e1513257f0295568a20f92dea297	2021-11-17 10:26:31 -08:00
Mike Iovine	61206ba4db	[SR] Add StorageGroup abstraction (#68279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68279 While reworking the liveness analysis, I noticed that using `std::pair<size_t, std::vector<Tensor>>` to represent storage groups made things quite unreadable. Add a simple class to wrap a `std::vector<at::Tensor>` and store a `size` attribute Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Also ran inline_cvr benchmarks, did not see any errors Reviewed By: swolchok Differential Revision: D32369447 fbshipit-source-id: e0b562aa7eefd738b1a34f1f37eb7bc95d71a257	2021-11-17 09:29:08 -08:00
Mikayla Gawarecki	cac3cd1433	add torch.diff support for n greater than 1 (#67260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67260 Addressing 54853 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31930294 Pulled By: mikaylagawarecki fbshipit-source-id: 97c7a27e9200c6688242680ff96b73dfff828479	2021-11-17 09:16:33 -08:00
vfdev-5	3da2e09c9b	Added antialias flag to interpolate (CPU only, bilinear) (#65142 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bilinear mode - added tests ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` # OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, Num threads: 1 [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.9 \| 3.1 channels_last non-contiguous torch.float32 \| 2.6 \| 3.6 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.4 \| 4.0 channels_last non-contiguous torch.float32 \| 3.4 \| 4.8 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 1.6 \| 1.8 channels_last non-contiguous torch.float32 \| 1.6 \| 1.9 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 9.0 \| 11.3 channels_last non-contiguous torch.float32 \| 8.9 \| 12.5 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.1 \| 1.8 channels_last non-contiguous torch.float32 \| 2.1 \| 3.4 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.2 \| 1.0 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.4 \| 1.3 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 719.9 \| 599.9 Times are in microseconds (us). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.7 \| 3.5 Times are in milliseconds (ms). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 834.4 \| 605.7 Times are in microseconds (us). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142 Reviewed By: mrshenli Differential Revision: D32432405 Pulled By: jbschlosser fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d	2021-11-17 09:10:15 -08:00
CodemodService FBSourceClangFormatLinterBot	143491e0ad	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32484422 fbshipit-source-id: 5c836dc7d06f12e64cc4bb1e85d8fa4b62a29b85	2021-11-17 07:27:04 -08:00
Mike Ruberry	3e3bf40b0a	Revert D32452012: [pytorch][PR] Fix flaky test_nccl_timeout Test Plan: revert-hammer Differential Revision: D32452012 (`faa1e8b7cf`) Original commit changeset: c959b25957f2 fbshipit-source-id: a2786744b12ceed350eec0ca2834f5176a4e21ee	2021-11-17 06:08:53 -08:00
Mike Ruberry	54ac64f035	Revert D32477989: [pytorch][PR] Actually enable PYTORCH_RETRY_TEST_CASES for linux tests Test Plan: revert-hammer Differential Revision: D32477989 (`173c0f8a98`) Original commit changeset: e28d095773f5 fbshipit-source-id: 2de5fac08f7f322a3aeb92a67b5fdfa0a6518bf1	2021-11-17 06:04:14 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Ansha Yu	01b30922dd	[static runtime] fuse gather+to+lengths_to_offsets (#64075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64075 Test Plan: Before: `I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987` After: `I0826 17:13:07.464485 1040300 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.46362. Iters per second: 154.712` Profile after: P453143683 Accuracy tested comparing with jit interpreter for no differences under 1e-3 (nnc ops turned on) https://www.internalfb.com/intern/diff/view-version/136824794/ ====== With 100-request recordio inputs (211 inputs) Before: `I1101 12:43:13.558375 742187 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.7882. Iters per second: 84.8309` After: `I1101 13:50:41.087644 1126186 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.6763. Iters per second: 85.6438` Profile after: P465977010 Constituent ops before (total is 0.5646): ``` 0.187392 ms. 1.61737%. fb::clip_ranges_gather (309 nodes, out variant) 0.174101 ms. 1.50266%. fb::lengths_to_offsets (464 nodes, out variant) 0.203126 ms. 1.75317%. static_runtime::to_copy (805 nodes, out variant) ``` Constitutent ops after (total is 0.4985): ``` 0.376559 ms. 3.25614%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0614349 ms. 0.531235%. fb::lengths_to_offsets (159 nodes, out variant) 0.0573315 ms. 0.495751%. static_runtime::to_copy (195 nodes, out variant) 0.00325543 ms. 0.0281501%. fb::gather_ranges (4 nodes, out variant) ``` Compare with jit interpreter inside benchmark: `I1101 13:55:53.013602 1149446 PtVsBlackBoxPredictorBenchLib.cpp:175] Finished comparing PT static runtime and jit interpreter results` ====== Casting on the fly: a. Static runtime off ``` Static runtime ms per iter: 11.4658. Iters per second: 87.2159 0.220367 ms. 1.94726%. static_runtime::to_copy (805 nodes, out variant) 0.172585 ms. 1.52504%. fb::clip_ranges_gather (309 nodes, out variant) 0.157836 ms. 1.39471%. fb::lengths_to_offsets (464 nodes, out variant) ``` b. Casting on the fly, using explicit allocation+to_copy (which has the fast pass for certain cases, but we'll always call empty): ``` I1115 09:08:35.711972 1925508 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 11.6732. Iters per second: 85.6662 0.599439 ms. 5.25098%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0552475 ms. 0.483958%. fb::lengths_to_offsets (159 nodes, out variant) 0.0576032 ms. 0.504593%. static_runtime::to_copy (195 nodes, out variant) 0.00299026 ms. 0.0261941%. fb::gather_ranges (4 nodes, out variant) ``` c. Casting on the fly with native::to (no explicit allocation, but no fast pass): ``` Static runtime ms per iter: 11.5627. Iters per second: 86.4849 0.454356 ms. 3.9652%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.06315 ms. 0.551115%. static_runtime::to_copy (195 nodes, out variant) 0.0590741 ms. 0.515544%. fb::lengths_to_offsets (159 nodes, out variant) 0.00359182 ms. 0.031346%. fb::clip_ranges_gather (4 nodes, out variant) ``` d. Removal of the to() call in question from the fusion pattern: ``` Static runtime ms per iter: 11.3658. Iters per second: 87.9836 0.29591 ms. 2.6479%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.154612 ms. 1.38352%. static_runtime::to_copy (500 nodes, out variant) 0.0567151 ms. 0.507505%. fb::lengths_to_offsets (159 nodes, out variant) 0.0051115 ms. 0.0457394%. fb::clip_ranges_gather (4 nodes, out variant) ``` Reviewed By: hlu1 Differential Revision: D30515441 fbshipit-source-id: 53acee10619ac2be7dc8982e929e3210c4bb6d21	2021-11-17 00:49:31 -08:00
Rohan Varma	faa1e8b7cf	Fix flaky test_nccl_timeout (#68403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 - Remove time.sleep call - Use gloo barrier to enforce rank synchronization - Reduce timeouts for allrduce - Pass in timeout and call wait() in _check_for_nccl_abort() cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68403 Reviewed By: H-Huang Differential Revision: D32452012 Pulled By: rohan-varma fbshipit-source-id: c959b25957f2eb8d59c506075da6023d25bbcfd9	2021-11-16 23:43:23 -08:00
Amit Kumar Chawla	6186b90c53	[Contrib][Fakelowp] Change Lut Size for Tanh (#68334 ) Summary: Reference code LUT size increased and now mininum starts from 0, instead of 7000 earlier Pull Request resolved: https://github.com/pytorch/pytorch/pull/68334 Reviewed By: jiecaoyu Differential Revision: D32467332 Pulled By: hl475 fbshipit-source-id: 3e4510e09374519aebe657a31f0b1ccde117e761	2021-11-16 23:39:02 -08:00
Yanli Zhao	f6696c5a85	export CPUOffload in _fsdp package (#68308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68308 export CPUOffload in _fsdp package, as cpu_offload config in FSDP API needs to import this class ghstack-source-id: 143560608 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32408719 fbshipit-source-id: ee5c40ec91a423fbd58872fbdeb5f2dda8a3d89e	2021-11-16 22:56:12 -08:00
Yanli Zhao	9c15523793	Attach unused parameter info to static graph error message (#68413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68413 attach unused parameter info to static graph error message ghstack-source-id: 143560766 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32457112 fbshipit-source-id: 31de859bf5289aa6044279014f0e76be9bcb9e54	2021-11-16 22:55:08 -08:00
Peter Bell	9de730ebba	q_avgpool: Loop over batch dimension inside operators (#66819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66819 This has a number of different advantages: - For channels last tensors, DispatchStub overhead is only incurred once. - For contiguous tensors, parallelization now happens over batch and chanels, enabling better load balancing between threads. - `q_scale()` and `q_zero_point()` are no longer called inside of a parallel region, which is not allowed (see gh-56794) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32445352 Pulled By: ngimel fbshipit-source-id: cd938e886cd5696855eb56a649eaf3bccce35e54	2021-11-16 22:29:42 -08:00
Sangbaek Park	1cade067e3	[Vulkan] Vulkan backend is now thread-safe (#67733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67733 Vulkan backend is now thread-safe: * `ThreadContext` class holds onto all per-thread Vulkan states such as Command, Descriptor and Resource objects. * `ThreadContext::SingletonThreadLocalObject<T>` is a very light version of `folly::SingletonThreadLocal` (https://github.com/facebook/folly/blob/main/folly/SingletonThreadLocal.h). It holds a static object with `thread_local` modifier. It is tied with a `GPU` object which allows us to expand multi-threaded GPU backend and multi-GPU capability in the future. The lifetime of `SingletonThreadLocalObject<T>` object is from the first call (instantiation) to the termination of thread. * `MAKE_VULKAN_THREADSAFE` preprocessor is used for BUCK and the implementation of thread-safe Vulkan backend. We can quickly exclude it from the BUCK if any unexpected issue gets uncovered in the future. Once we are confident it's stable, we can remove the preprocessor from the code. * A new perf test is added with `{3,40,221,193}` with 3 threads. * `vkQueueSubmit` is not thread-safe, only one thread can push the commands at a time (See https://vkguide.dev/docs/chapter-1/vulkan_command_flow/#vulkan-command-execution). The number of available queues depends on GPU. It could be 1 and we cannot assume we can create multiple queues. Thus, we need to avoid calling `vkQueueSubmit` from multiple threads at the same time. When running Vulkan backend in different threads without any locking mechanism, `vkQueueSubmit` will get the `VK_ERROR_INITIALIZATION_FAILED(-3)` error. * In the `Context::~Context()`, we should not call `flush()` since all per-thread objects will be destroyed as each thread exits. From the following logs, you can verify all per-thread objects are getting destroyed as their threads are terminated. The logs captured all ctor/dtor calls when running Vulkan backend with 3 different threads: ``` ThreadContext::ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] Context::Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] Resource::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] Command::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] Resource::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] Command::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] Resource::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] Command::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] Descriptor::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] Descriptor::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] Descriptor::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> enter Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> leave Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> enter Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> leave Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> enter Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> enter Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> leave Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> enter Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> leave Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> enter Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> leave Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> leave Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> enter Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> leave Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> enter Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> leave Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> enter Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> leave Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> enter Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> leave ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> enter ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> leave ``` Some notes on unexpected behaviors by `VkQueue`: * We need to make sure only one thread accesses `VkQueue` at a time if multi-threaded. Or we need to have a locking mechanism to protect `VkQueue` from multiple threads. This approach is used for this change. * To avoid having lock overhead, we tried to have per-thread `VkQueue` (having separate object per thread) didn't fix `VK_ERROR_INITIALIZATION_FAILED` error by `vkQueueSubmit` call. This was not expected. Interestingly, MacOS doesn't crash with this per-thread approach but no wonder since its behavior has been not that reliable. Not sure it's an Android Vulkan driver issue or not. * Making the entire `Context` as `thread_local` without any lock actually fixes the same error. Test Plan: Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 ``` //xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s) Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ============================================================================================================= Thread-safe Vulkan backend on Google Pixel 5 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 55.8 ms 15.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 25.6 ms 4.08 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 60.6 ms 14.3 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.52 ms 0.757 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 7.16 ms 0.770 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 35.9 ms 38.8 ms 3000 ============================================================================================================= Non thread-safe Vulkan backend on Google Pixel 5 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 55.0 ms 14.5 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 25.8 ms 4.30 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 60.6 ms 14.5 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.52 ms 0.761 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 7.15 ms 0.765 ms 5000 ``` For the single thread scenario of thread-safe and non thread-safe versions, the difference between them is less than 2% which is acceptable. In other words, there is no considerable performance degradation with the thread-safe Vulkan backend by using: * singleton thread local objects for `Command`, `Descriptor` and `Resource` pools * mutex lock for `VkQueueCommit` call Test result on MacOS ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 11.96, 7.17, 5.45 *WARNING* Library was built as DEBUG. Timings may be affected. ============================================================================================================= Thread-safe Vulkan backend on MacOS ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 58.4 ms 42.8 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 12.3 ms 5.43 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 56.0 ms 41.2 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.00 ms 1.52 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.56 ms 1.34 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 42.8 ms 42.8 ms 3000 ============================================================================================================= Non thread-safe Vulkan backend on MacOS ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 58.6 ms 42.6 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.3 ms 4.67 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 57.6 ms 42.4 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.89 ms 1.45 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.47 ms 1.27 ms 5000 ``` Non thread-safe version is slightly faster than the thread-safe one. This test result is only for reference since we cannot trust MacOS that has an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) on top of `Metal`. Reviewed By: SS-JIA Differential Revision: D32093974 fbshipit-source-id: 9eab7f0db976eff717540a5b32f94ed17a00b662	2021-11-16 22:09:32 -08:00
Peter Bell	2317e28e9e	Enable complex autograd for col2im / im2col (#68199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68199 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32467043 Pulled By: mruberry fbshipit-source-id: 9094aff036f75b280422e210f7089140ea61fc71	2021-11-16 21:11:44 -08:00
Peter Bell	fea2bb64c8	OpInfos for stft, istft, fftshift, ifftshift (#68198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68198 This unearths some bugs in istft backward, so I've disabled backward tests but it's fixed in the next PR in the stack. cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32467044 Pulled By: mruberry fbshipit-source-id: 5cf49560cbeb0263a66aafb48ed1bcc8884b75f1	2021-11-16 21:09:54 -08:00
Can Balioglu	6e640a0acf	Revise the socket implementation of c10d (#68226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68226 Note that this PR is unusually big due to the urgency of the changes. Please reach out to me in case you wish to have a "pair" review. This PR introduces a major refactoring of the socket implementation of the C10d library. A big portion of the logic is now contained in the `Socket` class and a follow-up PR will further consolidate the remaining parts. As of today the changes in this PR offer: - significantly better error handling and much more verbose logging (see the example output below) - explicit support for IPv6 and dual-stack sockets - correct handling of signal interrupts - better Windows support A follow-up PR will consolidate `send`/`recv` logic into `Socket` and fully migrate to non-blocking sockets. ## Example Output ``` [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [W logging.h:28] The server socket on [localhost]:29501 is not yet listening (Error: 111 - Connection refused), retrying... [I logging.h:21] The server socket will attempt to listen on an IPv6 address. [I logging.h:21] The server socket is attempting to listen on [::]:29501. [I logging.h:21] The server socket has started to listen on [::]:29501. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42650. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42650. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42722. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42722. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42724. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42724. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42726. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42726. ``` ghstack-source-id: 143501987 Test Plan: Run existing unit and integration tests on devserver, Fedora, Ubuntu, macOS Big Sur, Windows 10. Reviewed By: Babar, wilson100hong, mrshenli Differential Revision: D32372333 fbshipit-source-id: 2204ffa28ed0d3683a9cb3ebe1ea8d92a831325a	2021-11-16 20:49:25 -08:00
Matthias Reis	4c346bd073	Added forward derivatives for neg, diag, inverse, linalg_eig (#67837 ) Summary: Recreated due to CI failures as per comment https://github.com/pytorch/pytorch/pull/67339#issuecomment-959893293 === See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67837 Reviewed By: mrshenli Differential Revision: D32403662 Pulled By: soulitzer fbshipit-source-id: 529cb93f865ce4cc2e24fa6f672d4234e7abe2b1	2021-11-16 20:32:47 -08:00
Don Jang	aa9ee8d02a	[Static Runtime] Avoid copying function objects per StaticRuntime instance (#68368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68368 Currently, each instance of `StaticRuntime` has its own copy of `std::function` object wrapped in `ProcessedNode::Function` object, in order to invoke actual operation implementation. However, all instances of `StaticRuntime` derived from same `StaticModule` objects invoke exactly same op implementation, and this is avoidable. This change adds `StaticModule::functions_` member variable to keep a list of unique instance of `ProcessedFunction` objects. A newly constructed `StaticRuntime` takes `ProcessedFunction`'s pointers instead of the whole function object. This can save a substantial amount of memory per `StaticRuntime` instance. This comes with a sacrifice in execution time. Now that a `ProcessedNode` instance keeps the function object's pointer, executing a node now involves an extra pointer dereference. However, this cost was proved to be negligible from local performance tests. Thanks to hlu1 for proposing this non-intrusive improvement idea :D Test Plan: This change reduces the size of a StaticRuntime instance by 14.41% (459KB -> 393KB) (patched D32181666 to print the memory turnover from instantiating a StaticRuntime instance) for CMF/local ( & 8% for CMF/local_ro). No noticeable latency regression was observed. ==AFTER * CMF/local memory turnover: 393608 latency: PyTorch run finished. Milliseconds per iter: 15.6965. Iters per second: 63.7087 * CMF/local_ro memory turnover:387288 latency: PyTorch run finished. Milliseconds per iter: 7.51308. Iters per second: 133.101 ==BEFORE * CMF/local memory turnover: 459888 latency: PyTorch run finished. Milliseconds per iter: 15.8278. Iters per second: 63.18 * CMF/local_ro memory turnover: 420832 latenfcy: PyTorch run finished. Milliseconds per iter: 7.43756. Iters per second: 134.453 ==Confirmation that ptvsc2_predictor_bench reports the same memrmoy management stats for inline_cvr: ==AFTER Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%) Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ==BEFORE Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%) Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) Reviewed By: swolchok Differential Revision: D32337548 fbshipit-source-id: e714e735399c93fde337b0f70e203a2de632057a	2021-11-16 20:28:48 -08:00
Richard Barnes	fd85d925b0	Fix some sign issues (#68361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68361 Fixes ``` caffe2/aten/src/ATen/FunctionalizeFallbackKernel.cpp:36:31: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'const unsigned long' [-Werror,-Wsign-compare] for (int64_t idx = 0; idx < num_returns; ++idx) { ~~~ ^ ~~~~~~~~~~~ caffe2/aten/src/ATen/native/cuda/Sorting.cpp:87:16: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'std::vector::size_type' (aka 'unsigned long') [-Werror,-Wsign-compare] assert(dim < out_shape.size()); ~~~ ^ ~~~~~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32433063 fbshipit-source-id: b896dbab81861f3f074e00db73d20d9523037dd1	2021-11-16 20:18:58 -08:00
Jane Xu	173c0f8a98	Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486 ) Summary: After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable. What a noob mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486 Reviewed By: malfet, seemethere Differential Revision: D32477989 Pulled By: janeyx99 fbshipit-source-id: e28d095773f50864ab49229e434187a9ecb004cc	2021-11-16 19:02:03 -08:00
Ivan Yashchuk	affa3f846c	Sparse CSR CPU: add `torch.addmm` (#65606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606 This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32366236 Pulled By: cpuhrsch fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac	2021-11-16 17:22:46 -08:00
David Berard	5cfca5524c	[JIT] clear GraphFunction.optimized_graphs_ after freezing a module (#68316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68316 Consider the following: ``` class Mod(nn.Module): def __init__(self, val): super().__init__() self.param = nn.Parameter(val) def forward(self, x): # this method will change during freezing return x + self.param torch.jit.export def make_prediction(self, x): y = x + x return self.forward(y) param = torch.rand([2, 2]) unscripted_mod = Mod(param) mod = torch.jit.script(unscripted_mod) mod.eval() mod = torch.jit.freeze(mod, preserved_attrs=["make_prediction"])` ``` During freezing the following will occur: 1. do some pre-freezing, including inlining; in particular, forward will be inlined into make_prediction. During inlining, forward.optimized_graph() is called, and the result is cached 2. freeze some methods. While freezing forward, the graph associated with the function will get updated. The cached optimized_graphs_ are not updated. Previously, a call to `mod.forward(x)` would return an exectutor that would run on the old cached optimized_graph(). This would mean that the freezing optimizations would not apply, and potentially that the execution would fail because of parameters removed from the module. This change clears the optimized_graphs_ cache after running freezing to prevent executing an old version of the graph. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32410862 Pulled By: davidberard98 fbshipit-source-id: dd8bfe86ec2898b7c72813ab32c08f25c38e4cea	2021-11-16 17:15:29 -08:00
Hao Lu	75ccb07b26	[SR] LOG->VLOG (#68477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68477 We're printing a lot of unnecessary logs in prod. Change these from LOG(INFO) to VLOG(1) so you can easily flip them back for testing. Test Plan: CI Reviewed By: ajyu, d1jang Differential Revision: D32439776 fbshipit-source-id: 40fa57f4eeb6ca0b610008062cc94aed62fb6981	2021-11-16 17:09:52 -08:00
Saketh Are	515d9fb2a9	Add OpInfo for torch.histc (#67452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67452 Reviewed By: davidberard98 Differential Revision: D32453690 Pulled By: saketh-are fbshipit-source-id: 6311519dc1b2e92a200d0455d32a9c7301a45d51	2021-11-16 13:55:30 -08:00
Yanli Zhao	a8bcfc90f5	fix fsdp overlap flaky test (#68415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68415 remove e4["cpu_iter"] from short list as cpu may take some time to queue both compute and all-gather. close #68391 ghstack-source-id: 143478769 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32457334 fbshipit-source-id: baeedfb628ce4554a1ef365c3a2de27b8884f6d4	2021-11-16 13:52:13 -08:00
Nikita Shulga	27eca2c6fd	Revert D32467139: [pytorch][PR] [android][fbjni] Update fbjni to 0.2.2 Test Plan: revert-hammer Differential Revision: D32467139 (`04056df475`) Original commit changeset: 49e155989d2d fbshipit-source-id: ce03be3c6f209a6e9969660bd823d5343a7f0615	2021-11-16 13:50:50 -08:00
Aditya Tewary	284758b585	correct NLLLoss parameters default value (#68426 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17577 Previous `size_average by default: True` `reduce by default: True` Present `size_average by default: None` `reduce by default: None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68426 Reviewed By: VitalyFedyunin Differential Revision: D32463324 Pulled By: jbschlosser fbshipit-source-id: 7ba9cd03c9fb6b2f19301e7e39c3c490de17202b	2021-11-16 13:45:52 -08:00
Kefei Lu	76e9dbb0f4	[torch.fx] add code-gen customizability and support for setting breakpoint in code-gen'd forward() call (#67139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67139 This diff enables setting breakpoint in the graph module's generated python code. See test plan for usage. In order to support this functionality, and other similar functionalities to customize the generated code, a code transformer functionality is added to `fx.Graph`. This allows flexible customization of `fx.Graph`'s code gen behavior, in composable and functional ways. See test plan for its usage. Test Plan: ### Use of `fx.experimental.debug.set_trace` ``` In [2]: from torch.fx.experimental.debug import set_trace In [3]: set_trace(ttop) Out[3]: top( (a): Sub() ) In [4]: ttop(1) > /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.10(6)forward() (Pdb) l 1 2 3 4 def forward(self, x): 5 import pdb; pdb.set_trace() 6 -> a = self.a(x); x = None 7 getitem = a[0] 8 getitem_1 = a[0]; a = None 9 add = getitem + getitem_1; getitem = getitem_1 = None 10 return add 11 (Pdb) ``` ### Use of `on_generate_code` ``` In [1]: def insert_pdb(body): ...: return ['import pdb; pdb.set_trace()\n', *body] ...: In [8]: type(ttop) Out[8]: torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl In [10]: with ttop.graph.on_generate_code(lambda _: insert_pdb): ...: ttop.recompile() ...: print(f"== _on_generate_code should not be None: { ttop.graph._on_generate_code }") ...: print(ttop.code) ...: == _on_generate_code should not be None: <function insert_pdb at 0x7fc9895ddd30> def forward(self, x): import pdb; pdb.set_trace() a = self.a(x); x = None getitem = a[0] getitem_1 = a[0]; a = None add = getitem + getitem_1; getitem = getitem_1 = None return add In [11]: ttop.graph._on_generate_code # restored to None In [12]: ttop(1) # this should drop into pdb > /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.6(6)forward() (Pdb) l 1 2 3 4 def forward(self, x): 5 import pdb; pdb.set_trace() 6 -> a = self.a(x); x = None 7 getitem = a[0] 8 getitem_1 = a[0]; a = None 9 add = getitem + getitem_1; getitem = getitem_1 = None 10 return add 11 ``` Reviewed By: jamesr66a Differential Revision: D30736160 fbshipit-source-id: 9646867aae0461b5131dfd4ba9ee77a8c2ea9c93	2021-11-16 13:28:11 -08:00
Scott Wolchok	8954c92529	[PyTorch][Static Runtime] Borrow outputs in static_runtime::VarTupleUnpack (#68161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68161 Continuing rollout of borrowing outputs for native ops. ghstack-source-id: 143424920 Test Plan: Compare CMF local_ro perf again. Previous diff: ``` I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313 I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163 I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186 I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898 I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946 I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174 I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567 I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005 I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971 I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509 I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053 ``` This diff: ``` I1110 22:18:48.453075 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.931188. Iters per second: 1073.9 I1110 22:18:48.967614 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.933196. Iters per second: 1071.59 I1110 22:18:49.483338 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.932087. Iters per second: 1072.86 I1110 22:18:49.997144 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.930877. Iters per second: 1074.26 I1110 22:18:50.529383 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.936981. Iters per second: 1067.26 I1110 22:18:51.085038 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.953214. Iters per second: 1049.08 I1110 22:18:51.607192 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.940719. Iters per second: 1063.02 I1110 22:18:52.126169 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.942638. Iters per second: 1060.85 I1110 22:18:52.644445 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.937574. Iters per second: 1066.58 I1110 22:18:53.163486 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.941636. Iters per second: 1061.98 I1110 22:18:53.163537 191647 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 0.938011, standard deviation: 0.00691196 ``` 0.099 (9.5%!) usec/iter improvement over previous diff Reviewed By: hlu1 Differential Revision: D32347900 fbshipit-source-id: 8169ebcadf1248e555a18bbffa99eef6cac1ba85	2021-11-16 12:32:15 -08:00
Scott Wolchok	755be54c77	[PyTorch][Static Runtime] Borrow outputs in static_runtime::dict_unpack (#68160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68160 This generalizes the mechanism D32318674 added for letting native ops borrow their outputs and uses it in dict_unpack. ghstack-source-id: 143424919 Test Plan: 4.5% in CMF local_ro compared to D32318674 (previous two diffs were necessary steps but didn't get the full win yet): ``` FastAliasingInSelectTensor, local_ro ======================================== I1110 22:06:37.549811 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08488. Iters per second: 921.76 I1110 22:06:38.147949 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08675. Iters per second: 920.171 I1110 22:06:38.766340 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08626. Iters per second: 920.592 I1110 22:06:39.366608 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08376. Iters per second: 922.717 I1110 22:06:39.964979 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08362. Iters per second: 922.833 I1110 22:06:40.565248 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08423. Iters per second: 922.312 I1110 22:06:41.167326 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0945. Iters per second: 913.659 I1110 22:06:41.766187 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08373. Iters per second: 922.742 I1110 22:06:42.367816 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08995. Iters per second: 917.475 I1110 22:06:42.968391 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08854. Iters per second: 918.665 I1110 22:06:42.968446 119627 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.08662, standard deviation: 0.00351662 BorrowDictUnpackOutputs, local_ro ======================================== I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313 I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163 I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186 I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898 I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946 I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174 I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567 I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005 I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971 I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509 I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053 ``` 0.04936 (4.5%) usec/iter improvement Reviewed By: hlu1 Differential Revision: D32347390 fbshipit-source-id: e636ddafacf30ed2a2d84a6e15fff97481342fdb	2021-11-16 12:31:03 -08:00
Scott Wolchok	bbc24222d2	[PyTorch][Static Runtime] Refcount bump pass in native_ops (#68159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68159 These all look like they'll cause unnecessary refcount bumps to me. ghstack-source-id: 143424917 Test Plan: CI TODO profile local_ro? Reviewed By: hlu1 Differential Revision: D32347392 fbshipit-source-id: d8ed91b5855b86765db00c61ad3650273302c7b6	2021-11-16 12:27:12 -08:00
Saketh Are	86399d8e0c	Add histogramdd to torch.rst (#68273 ) Summary: The `torch.histogramdd` operator is documented in `torch/functional.py` but does not appear in the generated docs because it is missing from `docs/source/torch.rst`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68273 Reviewed By: cpuhrsch Differential Revision: D32470522 Pulled By: saketh-are fbshipit-source-id: a23e73ba336415457a30bae568bda80afa4ae3ed	2021-11-16 11:55:40 -08:00
Scott Wolchok	ed00a763a2	[PyTorch] Don't force refcount bump when accessing DictEntryRef key/value (#68158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68158 to() sometimes returns a reference; let's forward that through. ghstack-source-id: 143424916 Test Plan: Combined with following diff, seeing a huge drop in dict_unpack self time in ctr_mobile_feed local_ro net. Following diff by itself didn't work. Reviewed By: suo Differential Revision: D32347391 fbshipit-source-id: da96295bf83ea30867a2e3fceedc9b4e0a33ffa3	2021-11-16 11:44:08 -08:00
Ivan Kobzarev	04056df475	[android][fbjni] Update fbjni to 0.2.2 (#68400 ) Summary: ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400 Fixes #{issue number} Updates fbjni version to 0.2.2 ci-all PR: https://github.com/pytorch/pytorch/pull/68401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68402 Reviewed By: linbinyu Differential Revision: D32467139 Pulled By: IvanKobzarev fbshipit-source-id: 49e155989d2dbafedd5b2df77e089e25e8b4f8f8	2021-11-16 11:34:46 -08:00
Scott Wolchok	df129fa8d6	[PyTorch] Support MaybeOwned<IValue> (#68157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68157 Does what it says on the tin. I don't have a use for `MaybeOwned<IValue>` itself right now, but following diffs will use `MaybeOwnedTraits<IValue>::{create,destroy}Borrow` and I thought it best to just provide the full thing. ghstack-source-id: 143424915 Test Plan: Extended MaybeOwned tests to cover this. Reviewed By: hlu1 Differential Revision: D32347393 fbshipit-source-id: 219658cb69b951d36dee80c2ae51387328224866	2021-11-16 11:24:32 -08:00
Saketh Are	030ee34216	Add OpInfo for torch.nonzero (#67459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67459 Reviewed By: davidberard98 Differential Revision: D32453687 Pulled By: saketh-are fbshipit-source-id: e7ed5601686d88407bf67bca0f75304b30fa7ac5	2021-11-16 11:10:43 -08:00
Scott Wolchok	10e9d80ad1	[PyTorch][Static Runtime] Don't track scalar ivalues (#67702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67702 This isn't a particularly large optimization and it does nothing before select_tensor is introduced (I'm surprised that no operators have optimizable outputs!), but it seems like we should probably get the savings. ghstack-source-id: 143424918 Test Plan: CI; checked `--do_profile=1` ouput with following diff and we save tracking hundreds of values, as expected. Reviewed By: hlu1 Differential Revision: D32112522 fbshipit-source-id: 1804b77992a73670bfc1e36af608b852b8261bd2	2021-11-16 11:05:42 -08:00
eqy	391be39575	Use reduced precision switch in `test_addmm_baddbmm_overflow` (#68399 ) Summary: https://github.com/pytorch/pytorch/issues/68125 Checking to see if actually using the switch fixes the test... CC mruberry ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/68399 Reviewed By: VitalyFedyunin Differential Revision: D32466974 Pulled By: ngimel fbshipit-source-id: aa8643ed913b344977fd103974625c527d20dbb8	2021-11-16 10:50:17 -08:00
Michael Suo	5c3529a86d	[lint] small pass to make lint clean (#68367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68367 - bmm_test.py was using syntax not allowed in 3.6 - Some suppressions were not placed on the correct line. With this file, ``` lintrunner --paths-cmd='git grep -Il .' ``` passes successfully. Test Plan: Imported from OSS Reviewed By: janeyx99, mrshenli Differential Revision: D32436644 Pulled By: suo fbshipit-source-id: ae9300c6593d8564fb326822de157d00f4aaa3c2	2021-11-16 10:27:00 -08:00
Scott Wolchok	639258499f	[PyTorch][Static Runtime] Add & use "small array" for ProcessedNodeInputs (#67935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67935 Rationale should be documented in code comments. In short, we can avoid heap-allocating arrays of input indexes for operators with 5 or fewer inputs, at the cost of a tag bit check on access. ghstack-source-id: 143429112 Test Plan: Patched d1jang's D32181666, which prints static runtime memory usage. Previous diff, local: ``` I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208 ``` This diff, local: ``` I1105 12:48:35.820663 1066520 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 338064 ``` 4.5% savings (16144 bytes) Ran 10 repetitions of CMF local_ro with core pinning: P467095603. This diff is perf neutral compared to the previous diff. Reviewed By: hlu1 Differential Revision: D32216573 fbshipit-source-id: d18483db255f75f1d90e610ecded7727c6ffe65c	2021-11-16 10:21:12 -08:00
Scott Wolchok	6acde23bec	[PyTorch][Static Runtime] Switch input/output repr to 2-byte offsets (#67934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67934 This reduces the memory requirements of ProcessedNode: by allocating outputs sequentially into a shared array and supporting at most 2**16 - 1 values (current models seem to have 10-20x less than that), we only need to store the 2-byte offset into that array and 2-byte number of outputs in ProcessedNode. ghstack-source-id: 143429113 Test Plan: Patched d1jang's diff to measure memory turnover around SR startup. Previous diff, CMF local: ``` I1104 12:19:39.900211 597593 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 427120 ``` This diff, CMF local: ``` I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208 72912 bytes (17%) savings ``` Perf looks neutral; see next diff (D32216573) test plan for details. Reviewed By: hlu1 Differential Revision: D32190751 fbshipit-source-id: 30c1e2caa9460f0d83b2d9bb24c68ccfcef757cc	2021-11-16 10:19:50 -08:00
Scott Wolchok	8678472ec8	[PyTorch][Static Runtime] Save 2 pointers in ProcessedNode (#67860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67860 We don't need 8-byte sizes for inputs and outputs, and we only need op names if profiling isn't disabled. ghstack-source-id: 143429111 Test Plan: Ran CMF local & local_ro with recordio inputs. I'm calling the result inconclusive/neutral because I saw some noise (as you'll see below), but that's fine with me since this is a clear memory win. ``` Nov4Stable, local_ro ======================================== I1104 09:53:08.875444 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19925. Iters per second: 833.851 I1104 09:53:10.200443 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.1996. Iters per second: 833.608 I1104 09:53:11.524045 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19746. Iters per second: 835.103 I1104 09:53:12.851861 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20479. Iters per second: 830.019 I1104 09:53:14.183387 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20487. Iters per second: 829.964 I1104 09:53:14.183427 505783 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.2012, standard deviation: 0.00341762 re-ran stable in light of baffling regression (see next entry), and sure enough we still have some significant run-to-run-variation: I1104 09:56:15.244969 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24956. Iters per second: 800.28 I1104 09:56:16.621292 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24776. Iters per second: 801.437 I1104 09:56:18.018808 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25247. Iters per second: 798.42 I1104 09:56:19.399660 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25054. Iters per second: 799.656 I1104 09:56:20.781828 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25052. Iters per second: 799.664 I1104 09:56:20.781878 524012 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.25017, standard deviation: 0.00171396 Nov4SaveTwoWordsInProcessedNode, local_ro ======================================== I1104 09:53:42.070139 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.2411. Iters per second: 805.736 I1104 09:53:43.438390 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24102. Iters per second: 805.788 I1104 09:53:44.773303 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20682. Iters per second: 828.621 I1104 09:53:46.110538 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21216. Iters per second: 824.973 I1104 09:53:47.448279 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21265. Iters per second: 824.639 I1104 09:53:47.448334 508309 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22275, standard deviation: 0.0168698 early runs look like a glitch, rerunning I1104 09:54:20.999117 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24558. Iters per second: 802.841 I1104 09:54:22.376780 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24436. Iters per second: 803.623 I1104 09:54:23.738584 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23176. Iters per second: 811.845 I1104 09:54:25.113063 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24938. Iters per second: 800.395 I1104 09:54:26.476349 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23552. Iters per second: 809.377 I1104 09:54:26.476395 511022 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.24132, standard deviation: 0.00737197 Nov4Stable, local ======================================== I1104 09:57:56.854537 533814 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177885632 I1104 09:58:02.829813 533814 PrepareModelInputs.cpp:190] Loaded 696 records. I1104 09:58:03.010681 533814 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590507056 I1104 09:58:03.010710 533814 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1 I1104 09:58:58.839010 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0567. Iters per second: 49.8586 I1104 09:59:54.797755 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.1007. Iters per second: 49.7494 I1104 10:00:50.696525 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0657. Iters per second: 49.8363 I1104 10:01:46.514736 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0696. Iters per second: 49.8265 I1104 10:02:42.378270 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0641. Iters per second: 49.8402 I1104 10:02:42.378316 533814 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 20.0714, standard deviation: 0.0170605 I1104 10:02:42.378325 533814 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591882400 Nov4SaveTwoWordsInProcessedNode, local ======================================== I1104 10:38:15.543320 733514 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177721792 I1104 10:38:21.224673 733514 PrepareModelInputs.cpp:190] Loaded 696 records. I1104 10:38:21.382973 733514 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590343216 I1104 10:38:21.382992 733514 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1 I1104 10:39:17.005359 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9498. Iters per second: 50.1257 I1104 10:40:12.545269 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9279. Iters per second: 50.1808 I1104 10:41:08.138119 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.999. Iters per second: 50.0026 I1104 10:42:03.686841 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9115. Iters per second: 50.2222 I1104 10:42:55.137498 733539 Proxy2Connection.cpp:343] Received NotRegisteredException from Configerator Proxy2. I1104 10:42:55.138715 733539 ReadOnlyConnectionIf.h:91] Mark connection as healthy. I1104 10:42:55.384534 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6297. Iters per second: 50.9433 I1104 10:42:55.384579 733514 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.8836, standard deviation: 0.14571 I1104 10:42:55.384588 733514 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591711760 ``` Reviewed By: d1jang Differential Revision: D32177531 fbshipit-source-id: 267e38a151d2dbab34fd648135d173cfbee1c22e	2021-11-16 10:12:53 -08:00
Michael Suo	45b2f41c3e	[package] fix torchscript classes in package (#68028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68028 Today, we demangle a typename before passing it to the TorchScript compiler. This breaks compilation of torch classes in cases where we are attempting to script the same class name from inside a package and out, since we will return the same qualified name for both. Differential Revision: D32261907 D32261907 Test Plan: Imported from OSS Reviewed By: saketh-are Pulled By: suo fbshipit-source-id: 921bc03ad385d94b9279fbc6f3b7dcd0ddbe5bc7	2021-11-16 10:01:40 -08:00
Thomas Metcalfe	ba16b1eca7	[numpy] Alias `arctan2` to `atan2` (#67010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65906 Adds an alias `arctan2` to improve numpy compatibility cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/67010 Reviewed By: anjali411 Differential Revision: D32378998 Pulled By: mruberry fbshipit-source-id: 424c5c10c12b49c20ee83ccd109325c480b5b6cf	2021-11-16 09:41:09 -08:00
Sangbaek Park	6226a3cf74	[Vulkan] Implement permute operator (#68274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68274 Implemented `permute` operator on the Vulkan backend: * Supports only <= 4D tensors. * Builds up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access) * Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally. * 1D to 4D inputs with all possible permutations are used for test cases. * Reference on CPU implementation of `permute` operator: [TensorShape.cpp](`cbf596bf8e/aten/src/ATen/native/TensorShape.cpp (L936)`) * When shuffling dims, a new depth size of output texture needs to be determined by `ceil(batchchannel)/4`. This logic needs to be handled in a separate change. The depth of texture cannot exceed a certain number, depending on the device's capability. It is typically 2048 on most of android devices but less than or equal to 16,384 (see [Value distribution for maxImageDimension3D on Android](https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxImageDimension3D&platform=android)). i.e., 2048 on MacOS and Google Pixel 5. * Due to this limitation, `permute` op needs to throw an exception if the depth of output texture is greater than or equal to `VkImageFormatProperties.maxExtent.depth`. * Otherwise, the following error will occur: `-[MTLTextureDescriptorInternal validateWithDevice:]:1325: failed assertion "Texture Descriptor Validation MTLTextureDescriptor has depth (10664) greater than the maximum allowed size of 2048."` * Vulkan `permute` operator tensor conversion: {F679505029} {F679505223} * Vulkan `permute` operator shader equation: {F679504799} * Error/edge cases: ``` X = torch.randint(0, 23, (2, 3, 2, 2)) O = torch.permute(X, (2, 2, 1, 0)) # RuntimeError: repeated dim in permute O = torch.permute(X, (2, 1, 0)) # RuntimeError: number of dims don't match in permute O = torch.permute(X, (4, 3, 2, 1, 0)) # RuntimeError: number of dims don't match in permute O = torch.permute(X, (3, 2, -1, 0)) # RuntimeError: repeated dim in permute data2 = [0,1,2] X2 = torch.tensor(data2) O2 = torch.permute(X2, (0)) # permute(): argument 'dims' (position 2) must be tuple of ints, not int # TypeError: permute(): argument 'dims' (position 2) must be tuple of ints, not int O = torch.permute(X, (0, 1, 2, 3)) # do nothing since the dims doesn't change? ``` * Shader debug traces with a 4D tensor size {2,3,2,2} with permute by {3,2,1,0}: ``` output tensor: (1,1,.,.) = 0.4395 0.5652 0.1309 0.9768 0.0490 0.1127 (2,1,.,.) = 0.7058 0.2238 0.6542 0.4064 0.4813 0.0500 (1,2,.,.) = 0.1716 0.4951 0.2225 0.3255 0.0758 0.7150 (2,2,.,.) = 0.3762 0.0228 0.6367 0.4411 0.7682 0.7599 [ CPUFloatType{2,2,3,2} ] shader debug traces: src_index:0, b c h w: 0 0 0 0, posIn: (0 0 0) i:0 -> b c h w: 0 0 0 0, dst_index: 0, posOut: (0 0 0) j:0 -> inval[0.439453] outval[0.439453] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.439453 0.000000 0.000000 0.000000] src_index:3, b c h w: 1 0 0 0, posIn: (0 0 0) i:3 -> b c h w: 0 0 0 1, dst_index: 0, posOut: (1 0 0) j:0 -> inval[0.564941] outval[0.564941] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.564941 0.000000 0.000000 0.000000] src_index:1, b c h w: 0 1 0 0, posIn: (0 0 0) i:1 -> b c h w: 0 0 1 0, dst_index: 0, posOut: (0 1 0) j:0 -> inval[0.130859] outval[0.130859] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.130859 0.000000 0.000000 0.000000] src_index:4, b c h w: 1 1 0 0, posIn: (0 0 1) i:0 -> b c h w: 0 0 1 1, dst_index: 0, posOut: (1 1 0) j:0 -> inval[0.976562] outval[0.976562] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.976562 0.000000 0.000000 0.000000] src_index:2, b c h w: 0 2 0 0, posIn: (0 0 0) i:2 -> b c h w: 0 0 2 0, dst_index: 0, posOut: (0 2 0) j:0 -> inval[0.049011] outval[0.049011] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.049011 0.000000 0.000000 0.000000] src_index:5, b c h w: 1 2 0 0, posIn: (0 0 1) i:1 -> b c h w: 0 0 2 1, dst_index: 0, posOut: (1 2 0) j:0 -> inval[0.112671] outval[0.112671] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.112671 0.000000 0.000000 0.000000] src_index:0, b c h w: 0 0 1 0, posIn: (0 1 0) i:0 -> b c h w: 0 1 0 0, dst_index: 1, posOut: (0 0 0) j:1 -> inval[0.171509] outval[0.171509] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.439453 0.171509 0.000000 0.000000] src_index:3, b c h w: 1 0 1 0, posIn: (0 1 0) i:3 -> b c h w: 0 1 0 1, dst_index: 1, posOut: (1 0 0) j:1 -> inval[0.494873] outval[0.494873] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.564941 0.494873 0.000000 0.000000] src_index:1, b c h w: 0 1 1 0, posIn: (0 1 0) i:1 -> b c h w: 0 1 1 0, dst_index: 1, posOut: (0 1 0) j:1 -> inval[0.222412] outval[0.222412] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.130859 0.222412 0.000000 0.000000] src_index:4, b c h w: 1 1 1 0, posIn: (0 1 1) i:0 -> b c h w: 0 1 1 1, dst_index: 1, posOut: (1 1 0) j:1 -> inval[0.325439] outval[0.325439] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.000000 0.000000] src_index:2, b c h w: 0 2 1 0, posIn: (0 1 0) i:2 -> b c h w: 0 1 2 0, dst_index: 1, posOut: (0 2 0) j:1 -> inval[0.075745] outval[0.075745] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.049011 0.075745 0.000000 0.000000] src_index:5, b c h w: 1 2 1 0, posIn: (0 1 1) i:1 -> b c h w: 0 1 2 1, dst_index: 1, posOut: (1 2 0) j:1 -> inval[0.714844] outval[0.714844] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.000000 0.000000] src_index:0, b c h w: 0 0 0 1, posIn: (1 0 0) i:0 -> b c h w: 1 0 0 0, dst_index: 2, posOut: (0 0 0) j:2 -> inval[0.705566] outval[0.705566] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.439453 0.171509 0.705566 0.000000] src_index:3, b c h w: 1 0 0 1, posIn: (1 0 0) i:3 -> b c h w: 1 0 0 1, dst_index: 2, posOut: (1 0 0) j:2 -> inval[0.223755] outval[0.223755] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.564941 0.494873 0.223755 0.000000] src_index:1, b c h w: 0 1 0 1, posIn: (1 0 0) i:1 -> b c h w: 1 0 1 0, dst_index: 2, posOut: (0 1 0) j:2 -> inval[0.653809] outval[0.653809] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.130859 0.222412 0.653809 0.000000] src_index:4, b c h w: 1 1 0 1, posIn: (1 0 1) i:0 -> b c h w: 1 0 1 1, dst_index: 2, posOut: (1 1 0) j:2 -> inval[0.406250] outval[0.406250] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.000000] src_index:2, b c h w: 0 2 0 1, posIn: (1 0 0) i:2 -> b c h w: 1 0 2 0, dst_index: 2, posOut: (0 2 0) j:2 -> inval[0.481201] outval[0.481201] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.049011 0.075745 0.481201 0.000000] src_index:5, b c h w: 1 2 0 1, posIn: (1 0 1) i:1 -> b c h w: 1 0 2 1, dst_index: 2, posOut: (1 2 0) j:2 -> inval[0.049957] outval[0.049957] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.000000] src_index:0, b c h w: 0 0 1 1, posIn: (1 1 0) i:0 -> b c h w: 1 1 0 0, dst_index: 3, posOut: (0 0 0) j:3 -> inval[0.376221] outval[0.376221] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.439453 0.171509 0.705566 0.376221] outval_after[0.439453 0.171509 0.705566 0.376221] src_index:3, b c h w: 1 0 1 1, posIn: (1 1 0) i:3 -> b c h w: 1 1 0 1, dst_index: 3, posOut: (1 0 0) j:3 -> inval[0.022751] outval[0.022751] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.564941 0.494873 0.223755 0.022751] outval_after[0.564941 0.494873 0.223755 0.022751] src_index:1, b c h w: 0 1 1 1, posIn: (1 1 0) i:1 -> b c h w: 1 1 1 0, dst_index: 3, posOut: (0 1 0) j:3 -> inval[0.636719] outval[0.636719] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.130859 0.222412 0.653809 0.636719] outval_after[0.130859 0.222412 0.653809 0.636719] src_index:4, b c h w: 1 1 1 1, posIn: (1 1 1) i:0 -> b c h w: 1 1 1 1, dst_index: 3, posOut: (1 1 0) j:3 -> inval[0.440918] outval[0.440918] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.440918] outval_after[0.976562 0.325439 0.406250 0.440918] src_index:2, b c h w: 0 2 1 1, posIn: (1 1 0) i:2 -> b c h w: 1 1 2 0, dst_index: 3, posOut: (0 2 0) j:3 -> inval[0.768066] outval[0.768066] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.049011 0.075745 0.481201 0.768066] outval_after[0.049011 0.075745 0.481201 0.768066] src_index:5, b c h w: 1 2 1 1, posIn: (1 1 1) i:1 -> b c h w: 1 1 2 1, dst_index: 3, posOut: (1 2 0) j:3 -> inval[0.759766] outval[0.759766] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.759766] outval_after[0.112671 0.714844 0.049957 0.759766] ``` Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.permute_2d_success [ OK ] VulkanAPITest.permute_2d_success (26 ms) [ RUN ] VulkanAPITest.permute_3d_success [ OK ] VulkanAPITest.permute_3d_success (6 ms) [ RUN ] VulkanAPITest.permute_4d_success [ OK ] VulkanAPITest.permute_4d_success (10 ms) [ RUN ] VulkanAPITest.permute_4dmclaren_success [ OK ] VulkanAPITest.permute_4dmclaren_success (1 ms) [ RUN ] VulkanAPITest.permute_4dbig_success [ OK ] VulkanAPITest.permute_4dbig_success (234 ms) [ RUN ] VulkanAPITest.permute_negativedims_success [ OK ] VulkanAPITest.permute_negativedims_success (0 ms) [ RUN ] VulkanAPITest.permute_1d_nochange [ OK ] VulkanAPITest.permute_1d_nochange (0 ms) [ RUN ] VulkanAPITest.permute_sameDims_nochange [ OK ] VulkanAPITest.permute_sameDims_nochange (1 ms) [ RUN ] VulkanAPITest.permute_invalidinputs_exceptions [ OK ] VulkanAPITest.permute_invalidinputs_exceptions (1 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.permute_2d_success [ OK ] VulkanAPITest.permute_2d_success (154 ms) [ RUN ] VulkanAPITest.permute_3d_success [ OK ] VulkanAPITest.permute_3d_success (13 ms) [ RUN ] VulkanAPITest.permute_4d_success [ OK ] VulkanAPITest.permute_4d_success (33 ms) [ RUN ] VulkanAPITest.permute_4dmclaren_success [ OK ] VulkanAPITest.permute_4dmclaren_success (2 ms) [ RUN ] VulkanAPITest.permute_4dbig_success [ OK ] VulkanAPITest.permute_4dbig_success (251 ms) [ RUN ] VulkanAPITest.permute_negativedims_success [ OK ] VulkanAPITest.permute_negativedims_success (2 ms) [ RUN ] VulkanAPITest.permute_1d_nochange [ OK ] VulkanAPITest.permute_1d_nochange (1 ms) [ RUN ] VulkanAPITest.permute_sameDims_nochange [ OK ] VulkanAPITest.permute_sameDims_nochange (0 ms) [ RUN ] VulkanAPITest.permute_invalidinputs_exceptions [ OK ] VulkanAPITest.permute_invalidinputs_exceptions (2 ms) ``` Reviewed By: SS-JIA Differential Revision: D32292554 fbshipit-source-id: dbeaee6ff98633022cf34d6da90662d81eac6b0e	2021-11-16 09:27:51 -08:00
Kurt Mohler	bc3d380ed1	Throw error when saving storages that view same data with different type (#66949 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58970 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66949 Reviewed By: albanD Differential Revision: D31926323 Pulled By: anjali411 fbshipit-source-id: f6e7acc0c1968b70a94f9b0b69a32780e8e21a62	2021-11-16 08:44:44 -08:00
David Berard	bf60c6e71b	[JIT] remove prim::SetAttr from list of ops with side effects (#68311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68311 prim::SetAttr is listed as an op with side effects, but in AliasDb, `analyzeSetAttr` already accounts for its behavior. By removing it from the list of ops with side effects, dead code elimination will work in a few other scenarios. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32409510 fbshipit-source-id: 52ed9e19f92afb95c669ad3c2440f72f9515ba4c	2021-11-16 08:39:24 -08:00
lezcano	add79722dd	Correct `householder_product` docs. (#68335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68335 When discussing https://github.com/pytorch/pytorch/pull/63880, we realised that the docs of `householder_product` were not correct. This PR fixes this. The new docs are slightly more difficult, but hopefully correct. Note that this is a LAPACK function in disguise, so it is expected the specification to be more difficult than normal. cc brianjo mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32429755 Pulled By: mruberry fbshipit-source-id: 3ac866d30984adcd9f3b83d7fa9ae7b7ae5d4b53	2021-11-16 07:54:24 -08:00
Nikita Vedeneev	01a8862582	OpInfo tests for `nn.functional.max_pool{n}d`. (#68075 ) Summary: As per title. It is planned to use these tests for fixing issues with the max_unpools' backward methods reported in https://github.com/pytorch/pytorch/issues/67658 and https://github.com/pytorch/pytorch/issues/67657. max_unpool.backward methods are not tested and implemented with custom kernels. We can replace these kernels with advanced indexing operations (i.e. `gather`) which are efficient and well tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68075 Reviewed By: malfet Differential Revision: D32308317 Pulled By: mruberry fbshipit-source-id: 9f91c6e6a9d78c19230e93fc0a3164f4eb7b8ec5	2021-11-16 07:28:32 -08:00
Taylor Robie	33e9a0b5f6	[Reland] Python tracer. (#68325 ) Summary: There were two issues with the original PR: 1) My assumption that bound C functions could be trusted to stay alive was not valid. I'm still not entirely sure what was dying, but I've just added a cache so that the first time I see a function I collect the repr just like I was already doing with Python functions. 2) `std::regex` is known to be badly broken and prone to segfaults. Because I'm just doing a very simple prefix prune it's fine to do it manually; see `trimPrefix`. Long term we should move all of PyTorch to `re2` as the internal lint suggests, but CMake is hard and I couldn't get it to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68325 Reviewed By: chaekit Differential Revision: D32432596 Pulled By: robieta fbshipit-source-id: 06fb4bcdc6933a3e76f6021ca69dc77a467e4b2e	2021-11-15 23:32:49 -08:00
Richard Barnes	438ca7603f	Fix sign comparison issue in Histogram.cpp (#68294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68294 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D32403821 fbshipit-source-id: cdbf1d83ab02b1e996559e4cfbbe699b7165483a	2021-11-15 23:14:04 -08:00
Richard Barnes	ec742c65d5	Fix a sign comparison issue in BatchLinearAlgebraLib.cpp (#68293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68293 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D32403788 fbshipit-source-id: 1afc5e62e7157f144ec36b029ee3bcc6c23d65a1	2021-11-15 23:12:56 -08:00
CodemodService FBSourceClangFormatLinterBot	d541aa8cbe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32454757 fbshipit-source-id: ffb46701843245ac040905423eb950902b51951d	2021-11-15 21:54:23 -08:00
Stephen Macke	27cc11226d	make broadcast fastpath the default for currently rolled-out ops (#68365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68365 title. broadcast fastpath has been running fine for the enabled ops for a while now, so make it the default for these ops. Test Plan: diff is a no-op, so sandcastle Differential Revision: D32107847 fbshipit-source-id: b239b127b219985bf7df6a0eea2d879b8e9c79a4	2021-11-15 21:41:57 -08:00
Charles David Hernandez	7ee84ad321	Refactoring quantized op tests to combine test classes (#68282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68282 Combined 3 Dynamic quantized op test classes into 1 Test Plan: python test/test_quantization.py TestDynamicQuantizedOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D32402163 fbshipit-source-id: 696b7ef5d823632941dc7afc95161501445d0e18	2021-11-15 20:47:02 -08:00
Veselin Petrov	065018d812	[pytorch][xros] Ensure all pytorch mobile operators build ok in XROS mode (#68266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68266 * Use `if...endif` to adjust pyTorch internals towards XROS Test Plan: CI Reviewed By: kkosik20 Differential Revision: D32190771 fbshipit-source-id: cce073dea53c2b5681d913321101cd83c6472019	2021-11-15 19:52:45 -08:00
Mengchi Zhang	86c1368611	[fx][const fold] Add test/example for skipping quant/dequant pattern (#68378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68378 Add test/example for skipping quant/dequant pattern Reviewed By: jfix71 Differential Revision: D32410544 fbshipit-source-id: e63419a01a097e4c570c3861d79d573cabc0b294	2021-11-15 18:49:23 -08:00
Bowen Bao	722af775c3	[ONNX] ConstantMap setters to update existing value instead of emplace (#67630 ) (#67812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67812 `UpdateShape` uses `.emplace(tensorName, shapeValue)`. This will not update `shapeValue` for `tensorName`, if such name already exist in the map. Hence our code is not able to correct the shape inference error, even if we inferred the shape correctly later. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181300 Pulled By: malfet fbshipit-source-id: 05c58ad3fdac683ad957996acde8f0ed6341781d Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-11-15 17:20:07 -08:00
Deyu Huang	d32efe8bc2	[ONNX] Remove the argument use_external_data_format of export() method entirely. (#67080 ) (#67811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67811 * remove the argument use_external_data_format of export() method entirely Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181302 Pulled By: malfet fbshipit-source-id: 4bc1448b7487bb9dfdad4e36008ff5b227fd64a3 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-11-15 17:20:04 -08:00
Thiago Crepaldi	9d25554d45	[ONNX] Allow registration of custom symbolics for aten namespace (#66481 ) (#67810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67810 Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181303 Pulled By: malfet fbshipit-source-id: af2a715dc554b958fa3f5a7a8ae96cb3f7d112bb	2021-11-15 17:18:39 -08:00
Charles David Hernandez	09615cd0b0	Adding Dynamic Conv and ConvT ops/modules (#68176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176 it should be noted that for the modules, reduce_range is set to true by default in a similar fashion to linear_dynamic. Test Plan: python test/test_quantization.py TestDynamicQuantizedModule python test/test_quantization.py TestDynamicQuantizedConv python test/test_quantization.py TestQuantizedConv Imported from OSS Reviewed By: kimishpatel Differential Revision: D32374003 fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3	2021-11-15 16:42:25 -08:00
Nicolas Weber	529ebae0ac	Bugfix for TorchScript RNN RELU and TANH (#61274 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28418 Related https://github.com/pytorch/pytorch/issues/32976 but has already been fixed before. TorchScript handling of GRU and LSTM have been working, but not for RNN (Tanh and ReLU). The reason is that the ```Union[Tensor, PackedSequence]``` is not supported by TorchScript. Using ```torch._jit_internal._overload_method``` in ```RNNBase::Forward``` does not work, as it seems TorchScript does not correctly use them if the method gets inherited by ```RNN```. My solution is to move the ```RNNBase::forward``` to ```RNN``` and annotate using ```torch._jit_internal._overload_method```. LSTM and GRU anyway use their own ```forward``` methods, so there seems to be no problem related to this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61274 Reviewed By: anjali411 Differential Revision: D32374452 Pulled By: malfet fbshipit-source-id: 77bab2469c01c5dfa5eaab229429724a4172445d Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-15 16:20:58 -08:00
Raghavan Raman	2fd468e5f8	[jit] Set the graph input types before interpreting the graph during tracing (#68242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D32382958 Pulled By: navahgar fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f	2021-11-15 15:44:32 -08:00
Mike Iovine	9ed49449b3	[SR] Add net level record functions (#68091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68091 Add record functions for recording perf stats on the entire network. Note that this is backed by the same pre-sampling mechanism as the op record functions, so net level stats get logged relatively infrequently. (If this is not acceptable, we can not use pre-sampling at the cost of a little bit of perf, every inference will require an RNG call) Reviewed By: hlu1 Differential Revision: D32296756 fbshipit-source-id: 09ff16c942f3bfc8f4435d6cca2be4a6b8dc6091	2021-11-15 15:39:08 -08:00
Will Constable	0823d18fcd	make TSComputation ctor explicit (#68286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68286 Test Plan: check it compiles Reviewed By: alanwaketan Differential Revision: D32402016 fbshipit-source-id: b623afa8831cd906336d7fcafbcbad32f79254b0	2021-11-15 14:58:33 -08:00
Eli Uriegas	7b958fbec4	ci: Build periodic jobs with DEBUG=1 (#67192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67192 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, janeyx99 Differential Revision: D31902447 Pulled By: seemethere fbshipit-source-id: 1d1cca8b5ac84b1c23ab73e2d973bfb7bffa8982	2021-11-15 14:51:06 -08:00
Jane Xu	ea0a558487	GHA CI: make the default config use only one GPU (#68382 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68382 Reviewed By: mrshenli Differential Revision: D32441585 Pulled By: janeyx99 fbshipit-source-id: d92407c9bb9e4f740435840b4022e75749d7f0ba	2021-11-15 14:35:49 -08:00
vfdev-5	6adbe044e3	Added nearest-exact interpolation mode (#64501 ) Summary: Added "nearest-exact" interpolation mode to fix the issues: https://github.com/pytorch/pytorch/issues/34808 and https://github.com/pytorch/pytorch/issues/62237. Description: As we can not fix "nearest" mode without large impact on already trained model [it was suggested](https://github.com/pytorch/pytorch/pull/64501#pullrequestreview-749771815) to introduce new mode instead of fixing exising "nearest" mode. - New mode "nearest-exact" performs index computation for nearest interpolation to match scikit-image, pillow, TF2 and while "nearest" mode still match opencv INTER_NEAREST, which appears to be buggy, see https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries. "nearest": ``` input_index_f32 = output_index * scale input_index = floor(input_index_f32) ``` "nearest-exact" ``` input_index_f32 = (output_index + 0.5) * scale - 0.5 input_index = round(input_index_f32) ``` Comparisions with other libs: https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664 PyTorch version \| 1.9.0 "nearest" \| this PR "nearest" \| this PR "nearest-exact" ---\|---\|---\|--- Resize option: \| \| OpenCV INTER_NEAREST result mismatches \| 0 \| 0 \| 10 OpenCV INTER_NEAREST_EXACT result mismatches \| 9 \| 9 \| 9 Scikit-Image result mismatches \| 10 \| 10 \| 0 Pillow result mismatches \| 10 \| 10 \| 7 TensorFlow result mismatches \| 10 \| 10 \| 0 Rescale option: \| \| \| size mismatches (https://github.com/pytorch/pytorch/issues/62396) \| 10 \| 10 \| 10 OpenCV INTER_NEAREST result mismatches \| 3 \| 3\| 5 OpenCV INTER_NEAREST_EXACT result mismatches \| 3 \| 3\| 4 Scikit-Image result mismatches \| 4 \| 4 \| 0 Scipy result mismatches \| 4 \| 4 \| 0 TensorFlow: no such option \| - \| - Versions: ``` skimage: 0.19.0.dev0 opencv: 4.5.4-dev scipy: 1.7.2 Pillow: 8.4.0 TensorFlow: 2.7.0 ``` Implementations in other libs: - Pillow: - `ee079ae67e/src/libImaging/Geometry.c (L889-L899)` - `ee079ae67e/src/libImaging/Geometry.c (L11)` - `a[2] == 0` - Scikit-Image : - dev v0.19.0 uses scipy ndi.zoom: - `38fae50c3f/skimage/transform/_warps.py (L180-L188)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L775-L779)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L479)` Additionally: - Updated upsampling tests cc ezyang gchanan albanD mruberry jbschlosser walterddr fmassa heitorschueroff ppwwyyxx Pull Request resolved: https://github.com/pytorch/pytorch/pull/64501 Reviewed By: anjali411 Differential Revision: D32361901 Pulled By: jbschlosser fbshipit-source-id: df906f4d25a2b2180e1942ffbab2cc14600aeed2	2021-11-15 14:28:19 -08:00
Digant Desai	e3bcf64ff8	[qnnpack] Remove redundant fp16 dependency (#68011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68011 `qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`. Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`. This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake. This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`. Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a. Reviewed By: salilsdesai Differential Revision: D32250984 fbshipit-source-id: 21468d8ef79c90e9876dc00da95383180a1031b5	2021-11-15 12:38:44 -08:00
Shiyan Deng	0cf46fb0de	[fx2trt] fix a bug in conversion from negative dim to positive dim (#68360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68360 Added a helper function to do this. Only use `mod` to convert negative dim to positive. Do nothing when it's already positive. Previously in `getitem` if we are slicing to the very end, we will get the dimension wrong. Test Plan: Add a unit test Reviewed By: yinghai, wushirong Differential Revision: D32432893 fbshipit-source-id: 3c5d6a578d92a15207a5e52802750f9ea7f272a9	2021-11-15 12:30:50 -08:00
Saketh Are	549e014963	[docs] fix torch.histc's min/max arg types (#64191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31475. `torch.histc` accepts Scalar min/max. The docs erroneously specified their types as int. cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/64191 Reviewed By: mrshenli Differential Revision: D32437279 Pulled By: saketh-are fbshipit-source-id: e6017e9236d815abd818dcd44e27819611666823	2021-11-15 12:29:25 -08:00
Mike Iovine	ccd9675569	[lint] Disable modernize-use-nodiscard (#68354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68354 Lint rule: https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nodiscard.html This check adds a ton of noise to our diffs. `[[nodiscard]]` is typically only useful when ignoring the return value of a function is a critical error, e.g. for `operator new`. Test Plan: Verified that the lint does not get triggered Reviewed By: hlu1 Differential Revision: D32429731 fbshipit-source-id: ca3d90686ec8d419d3f96167140dc406df6f4a53	2021-11-15 12:11:08 -08:00
Mike Iovine	c697eeba72	[JIT] Combine concat nodes where possible (#67000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000 See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context. This new JIT optimization transforms patterns like this: ``` %inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c) %concat.1 : Tensor = aten::cat(%inputs, %dim) %inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` into this: ``` %inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` (it can do this for chains of `aten::cat` longer than 2 as well) A few conditions have to hold: 1. The `dim`s have to match. 2. `inputs.1` and `inputs.2` cannot be mutated Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt` Reviewed By: d1jang Differential Revision: D31819491 fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba	2021-11-15 12:02:45 -08:00
Brian Hirsh	30cda0b28c	[bugfix] functionalization pass for view ops without a 'self' first argumennt (#68339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68339 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32429570 Pulled By: bdhirsh fbshipit-source-id: e6df243c508c2ba2ca1df7a53fa68f32db454f32	2021-11-15 11:58:21 -08:00
Brian Hirsh	5b05983497	[bugfix] fix two edge cases in functionalization (#68269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68269 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32396357 Pulled By: bdhirsh fbshipit-source-id: 1d374b693f3f526d027104cbdc08b8bbe9d38307	2021-11-15 11:58:18 -08:00
yanbing-j	12026124cc	Avoid the view for mkldnn case in 1D convolution (#68166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166 Reviewed By: mrshenli Differential Revision: D32432444 Pulled By: jbschlosser fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33	2021-11-15 11:56:45 -08:00
Jane Xu	56024e91c9	GHA: Enable flaky test reporting by setting PYTORCH_RETRY_TEST_CASES=1 (#68300 ) Summary: Enables https://github.com/pytorch/pytorch/issues/68150 in CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/68300 Reviewed By: seemethere Differential Revision: D32435332 Pulled By: janeyx99 fbshipit-source-id: 155018afaf73d5a24d13d358879361468ec7b18e	2021-11-15 11:23:55 -08:00
Michael Suo	24b60b2cbf	[lint] lintrunner fixes/improvements (#68292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68292 - noqa was typo-d to be the same as type: ignore - generalize clang-tidy initialization and use it for clang_format as well - Add a script that lets you update the binaries in s3 relatively easily Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32403934 Pulled By: suo fbshipit-source-id: 4e21b22605216f013d87d636a205707ca8e0af36	2021-11-15 11:08:26 -08:00
lezcano	43874d79e7	Fix failing test due to a bug in NumPy when using OpenBLAS (#67679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67679 implementations Fixes https://github.com/pytorch/pytorch/issues/67675 cc mruberry Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32368698 Pulled By: mruberry fbshipit-source-id: 3ea6ebc43c061af2f376cdf5da06884859bbbf53	2021-11-15 08:25:12 -08:00
Andrey Talman	d1c529bd0b	replace platform specific CI environment variables with generic ones (#68133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68133 Reviewed By: saketh-are Differential Revision: D32401080 Pulled By: atalman fbshipit-source-id: 057a34a56f8a2d324f4d1ea07da3a09772177897	2021-11-15 07:02:44 -08:00
Mengchi Zhang	1c0d6ff835	[fx][const fold] Allow to set up a function to modify const_nodes for split_const_subgraphs (#67784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67784 FX model generates quant/dequant layers for INT8 explicit mode support. However, if the inputs of quant/dequant layers are constant, the layer will be put into constant subgraph and optimized out. Hence TensorRT will fails to parse the left over graph. It is better to set up an optional function (skip_folding_node_fn) to skip folding nodes for split_const_subgraphs. Reviewed By: jfix71 Differential Revision: D32076970 fbshipit-source-id: 7dcbb4f02386f8c831d09a2f0e40bcdba904471c	2021-11-15 06:51:19 -08:00
Erjia Guan	4c87aa77d1	[DataPipe] Traverse DataPipe graph excluding primitive and callable (#67783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67783 Add `getstate_hook` to exclude primitive objects and callable when serialization when `exclude_primitive` is enabled for `traverse`. For graph traversing, we don't have to handle the lambda and other stuff. This is used by `OnDiskCacheHolder` to trace the DataPipe Graph. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32146697 Pulled By: ejguan fbshipit-source-id: 03b2ce981bb21066e807f57c167b77b2d0e0ce61	2021-11-15 06:46:31 -08:00
Yinghai Lu	1adeeabdc0	Fix trt tuple(Dims) throwing issue (#68318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68318 Adding a `__iter__` binding so that when we do `tuple(Dims)` can construct the right iterator and knows where to stop instead of trial and error with exception catch. We should upstream this to https://github.com/NVIDIA/TensorRT. cc: wushirong I did try a very similar `__iter__` fix previsouly but not sure why it wasn't effective... Reviewed By: kflu, wushirong Differential Revision: D32412430 fbshipit-source-id: 6390a1275dc34ef498acf933bb96f636c15baf41	2021-11-13 19:48:46 -08:00
Thomas Viehmann	be281fc597	Check for None in torch.jit.Graph.create (#68253 ) Summary: ...because we don't like segfaults from Python (see test). Pull Request resolved: https://github.com/pytorch/pytorch/pull/68253 Reviewed By: suo Differential Revision: D32396747 Pulled By: gmagogsfm fbshipit-source-id: a0925e8479702766e88176280985a63bc79e4f6a	2021-11-13 11:30:33 -08:00
Ivan Kobzarev	6fb8ebcd92	[tensorexp] Add strides to Buf (#68018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68018 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32262381 Pulled By: IvanKobzarev fbshipit-source-id: dba79add0bf703bc2378d64e726d4c47ec30e3be	2021-11-13 08:33:01 -08:00
David Dang	f7366ca51b	implemented quantize_per_tensor_dynamic and added a corresponding test script (#68004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68004 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D32301792 Pulled By: dzdang fbshipit-source-id: f680557ba4736d095efc33e8c92111265f25aee0	2021-11-13 06:34:36 -08:00
Rohan Varma	cb14a258a2	[c10d] Fix object-based collectives for debug mode (#68223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68223 DETAIL debug mode didn't work with object-based collectives for NCCL backend, because we'd only check if backend is NCCL and then move tensors to CUDA. Instead, check if it is a wrapped PG, and then check the pg that is wrapped to see if its nccl. ghstack-source-id: 143242023 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32366840 fbshipit-source-id: be0a2af6849f8f24446593f4a4fbea4a67586ee5	2021-11-13 04:18:31 -08:00
Mikhail Zolotukhin	ec94bb787a	[TensorExpr] Add a way to define target triple/cpu/attrs for llvm codegen and turn on the AOT workflow. (#66527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66527 Differential Revision: D31593869 D31593869 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: e7534c11fbcf0dab5f49d01d6053caf77b833ef0	2021-11-13 00:52:20 -08:00
Mikhail Zolotukhin	52e93fca2c	[TensorExpr] Fix some TE python bindings. (#68232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68232 Differential Revision: D32380676 D32380676 Test Plan: Imported from OSS Reviewed By: saketh-are Pulled By: ZolotukhinM fbshipit-source-id: 9287a2c765a53b45ac04d625cc010f5384a8bddf	2021-11-13 00:52:18 -08:00
Mikhail Zolotukhin	e511a7a5b4	[TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277 Differential Revision: D32400553 D32400553 Test Plan: Imported from OSS Reviewed By: saketh-are, priyaramani Pulled By: ZolotukhinM fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e	2021-11-13 00:50:57 -08:00
Jane Xu	80339e85c5	Fix disabling bot with subprocessing (#68290 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68270 Tested locally + tests get disabled properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/68290 Reviewed By: mrshenli Differential Revision: D32403956 Pulled By: janeyx99 fbshipit-source-id: 86629daa86f83f6777f2279524ef973af51046b9	2021-11-12 19:56:17 -08:00
Shirong Wu	282221c5d6	Fuse unsqueeze, cat, sum for inline_cvr (#68289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68289 Fuse op unsqueese+cat+sum to add op Reviewed By: jfix71 Differential Revision: D31769197 fbshipit-source-id: 184b3c8217f2ad9fab9ac8d3c91cd33cf7e7de30	2021-11-12 18:20:11 -08:00
Deyu Huang	48c8de45b0	[ONNX] Remove the argument example_outpus of export() method entirely. (#67082 ) (#67809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67809 * remove the argument example_outpus of export() method entirely [ONNX] Follow-up: Remove the argument example_outpus of export() method entirely. (#67629) * Resolve CI failure * remove test after removing example_outputs [ONNX] Follow-up: Follow-up: Remove the argument example_outpus of export() method entirely (#67719) Removing unused import, resolving flake error. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181305 Pulled By: malfet fbshipit-source-id: ba00547b7cb455ace86606b1bda643c02bdcfa1b Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-11-12 17:06:26 -08:00
Richard Zou	a8b93cb3ec	More aggressively market functorch.vmap when torch.vmap gets called (#67347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67347 This PR: - changes the warning when torch.vmap gets called to suggest using functorch.vmap - changes the warning when a batching rule isn't implemented to suggest using functorch.vmap Test Plan: - test/test_vmap.py Reviewed By: H-Huang Differential Revision: D31966603 Pulled By: zou3519 fbshipit-source-id: b01dc1c2e298ce899b4a3a5fb333222a8d5bfb56	2021-11-12 16:10:16 -08:00
Jane Xu	da5ffe752a	Add reporting for flaky tests in CI (#68150 ) Summary: This PR does NOT change how signal is displayed in CI but rather just reports stats of flaky tests to RDS. None of the below will be enabled after landing this PR--it will be done in a separate PR with environment variables. We report flaky tests stats when a test first fails, and when we rerun it MAX_NUM_RETRIES times, we get at least one success. For tests that fail all the reruns, we assume it is because it is a real test failure. For tests that succeed the first time, we do not rerun the test, even if it was previously noted as flaky. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68150 Test Plan: First, I modified: test_async_python to always fail (will be our "failing test") test_async_future_type_python to fail 40% of the time test_async_script_capture to fail 60% of the time Then, running `python test/test_jit.py -v -k test_async` while setting IN_CI to 1: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.004s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.020s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.045s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 3 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 2 test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_capture succeeded - num_retries_left: 1 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 0 test_async_script_error (jit.test_async.TestAsync) ... ok (0.040s) test_async_script_multi_forks (jit.test_async.TestAsync) ... ok (0.025s) test_async_script_multi_waits (jit.test_async.TestAsync) ... ok (0.009s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ====================================================================== FAIL [0.010s]: test_async_script_capture (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 123, in test_async_script_capture self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 28 tests in 0.399s FAILED (failures=2, expected failures=5, unexpected successes=1) ``` Yielding this as the test report (I changed the extension from xml to txt so it uploads here): [TEST-jit.test_async.TestAsync-20211110222055.txt](https://github.com/pytorch/pytorch/files/7517532/TEST-jit.test_async.TestAsync-20211110222055.txt) And then running print_test_stats correctly excludes the all failing test `test_async_python` and calculates red and green appropriately: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_script_capture', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 3, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] (pytorch) janeyx@janeyx-mbp pytorch % ``` ------------------- If you're curious, I also included the code for when we would like to override the report_only feature and also hide flaky signal in CI. The results for the same test command correctly still fail the test suite, but mark the flaky test_async_future_type_python as passed: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_future_type_python failed - num_retries_left: 3 test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.001s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.017s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.091s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_error (jit.test_async.TestAsync) ... ok (0.039s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 26 tests in 0.390s FAILED (failures=1, expected failures=4) ``` With test reports: [TEST-jit.test_async.TestAsync-20211110224810.txt](https://github.com/pytorch/pytorch/files/7517663/TEST-jit.test_async.TestAsync-20211110224810.txt) And running print_test_stats: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_future_type_python', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 1, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] ``` Reviewed By: saketh-are Differential Revision: D32393907 Pulled By: janeyx99 fbshipit-source-id: 37df890481ab84c62809c022dc6338b50972899c	2021-11-12 15:03:14 -08:00
Jane Xu	8bf150f21b	Revert D32178667: [pytorch][PR] Python tracer for profiler Test Plan: revert-hammer Differential Revision: D32178667 (`33353fb828`) Original commit changeset: 118547104a7d fbshipit-source-id: 47510607589fc39c730ba913f47c01a7d107b7b0	2021-11-12 14:53:52 -08:00
Peter Bell	a82e51a7ae	Move some cub templates out of the header file (#67650 ) Summary: Cub routines are both expensive to compile and used in multiple different operators throughout the cuda folder. So, it makes sense to compile them in one centralized place where possible (i.e. when custom operators aren't involved). Pull Request resolved: https://github.com/pytorch/pytorch/pull/67650 Reviewed By: mruberry Differential Revision: D32259660 Pulled By: ngimel fbshipit-source-id: 5f7dbdb134297e1ffdc1c7fc5aefee70a2fa5422	2021-11-12 13:51:11 -08:00
Will Constable	6ddaf3bd37	[LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154 Test Plan: added a basic test; cover more by using lazy_tensor_staging tests Reviewed By: Krovatkin, alanwaketan Differential Revision: D32224303 fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914	2021-11-12 12:57:20 -08:00
Ben Koopman	f6e45102d2	[quant][embedding qat] Support non-partial functions in qconfig comparison (#68067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68067 Embedding QAT uses a NoopObserver class for activation, and a FakeQuant for weight, make sure that qconfig comparison functions properly for a mix of partial function and class in qconfig. Test Plan: `pytest test/quantization/eager/test_quantize_eager_qat.py -v -k "test_embedding_qat_qconfig_equal"` Imported from OSS Reviewed By: HDCharles Differential Revision: D32318434 fbshipit-source-id: c036eef9cbabe7c247745930501328e9c75a8cb0	2021-11-12 12:48:00 -08:00
Mikhail Zolotukhin	66b52d5b49	[TensorExpr] Convert linear_clamp_run to using schema in NNC lowerings. (#66523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66523 Differential Revision: D31590857 D31590857 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ZolotukhinM fbshipit-source-id: da8a7d68c8a4cf74c3f622b8a3af54d00ffb14a6	2021-11-12 12:26:06 -08:00
Kevin Tse	06e8cb9e04	Manually Disabling two TestDistBackendWithSpawn tests on ROCm, test_ddp_profiling_torch_profiler and test_ddp_sync_bn_training_vs_eval (#68255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68255 Manually disabling these two tests because they can't be disabled via Probot. See the issues #68222 and #68173 for details. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Test Plan: Imported from OSS Reviewed By: malfet, saketh-are Differential Revision: D32390899 Pulled By: NivekT fbshipit-source-id: bd4996d73014337a9175b20ae67a3880ee994699	2021-11-12 12:04:21 -08:00
Taylor Robie	33353fb828	Python tracer for profiler (#67407 ) Summary: This PR instruments the CPython interpreter and integrates the resulting trace into the PyTorch profiler. The python tracing logic works by enabling `PyEval_SetProfile`, and then logging the minimal information to track every time python calls or returns from a function. A great deal of care has gone into keeping this process very lightweight; the `RawEvent` struct is only two words and doesn't do anything fancy. When a python function is called, we have to do extra work. If the call is to `nn.Module.__call__`, we simply incref to extend the life of the module. Otherwise we check if we have seen the function before, and if not go through the (somewhat expensive) task of saving the strings which we then cache. To actually get a useful timeline, we have to replay the events to determine the state of the python stack at any given point. A second round of stack replay is needed to figure out what the last python function was for each torch op so we can reconstruct the correct python stack. All of this is done during post processing, so while we want to be reasonably performant it is no longer imperative to shave every last bit. I still need to do a bit of refinement (particularly where the tracer interfaces with the profiler), but this should give a good sense of the general structure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67407 Test Plan: ``` import torch class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(2, 2) self.relu = torch.nn.ReLU() def forward(self, x): x = self.linear(x) return self.relu(x) def call_module(): m = MyModule() for _ in range(4): m(torch.ones((2, 2))) def top_level_fn(): with torch.profiler.profile(with_stack=True) as p: call_module() p.export_chrome_trace("test_trace.json") top_level_fn() ``` <img width="1043" alt="Screen Shot 2021-10-27 at 6 43 18 PM" src="https://user-images.githubusercontent.com/13089297/139171803-f95e70f3-24aa-45e6-9d4b-6d437a3f108d.png"> PS: I've tried to comment liberally, particularly around some of the more magical parts. However I do plan on doing another linting and commenting pass. Hopefully it's not too bad right now. Reviewed By: gdankel, chaekit Differential Revision: D32178667 Pulled By: robieta fbshipit-source-id: 118547104a7d887e830f17b94d3a29ee4f8c482f	2021-11-12 11:58:12 -08:00
David Berard	96d116fec2	[JIT] Add additional debug output when op cannot be found in AliasDb (#68099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68099 When an op in the graph cannot be matched to any known ops, alias_analysis.cpp throws an error. Before: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for aten::add but it isn't a special case. Argument types: Tensor, float, Tensor, ``` After: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for a ten::add but it isn't a special case. Argument types: Tensor, float, Tensor, Candidates: aten::add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> (Tensor) aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor) aten::add.out(Tensor self, Tensor other, , Scalar alpha=1, Tensor(a!) out) -> (Tensor(a!)) aten::add.t(t[] a, t[] b) -> (t[]) aten::add.str(str a, str b) -> (str) aten::add.int(int a, int b) -> (int) aten::add.complex(complex a, complex b) -> (complex) aten::add.float(float a, float b) -> (float) aten::add.int_complex(int a, complex b) -> (complex) aten::add.complex_int(complex a, int b) -> (complex) aten::add.float_complex(float a, complex b) -> (complex) aten::add.complex_float(complex a, float b) -> (complex) aten::add.int_float(int a, float b) -> (float) aten::add.float_int(float a, int b) -> (float) aten::add(Scalar a, Scalar b) -> (Scalar) ``` Test Plan: Run ``` import torch if __name__ == '__main__': ir = """ graph(%x : Tensor, %y : Tensor): %2 : float = prim::Constant[value=1.2]() %result : Tensor= aten::add(%x, %2, %y) return (%result) """ x = torch.tensor([[1., 2.], [3., 4.]]) y = torch.tensor([[2., 1.], [2., 1.]]) graph = torch._C.parse_ir(ir) print(graph) graph.alias_db().analyze() # print(script(x, y)) ``` to get the results above Imported from OSS Reviewed By: anjali411 Differential Revision: D32339639 fbshipit-source-id: a79a3c2f157154b5fb1e3f33a23e43b7884e8e38	2021-11-12 08:39:41 -08:00
Mike Ruberry	98bab78e11	Revert D32039318: [pytorch][PR] Bump dlpack.h to latest version Test Plan: revert-hammer Differential Revision: D32039318 (`d049772538`) Original commit changeset: 7dfc653e1e77 fbshipit-source-id: 0d4b1af7381a2638ca9f3c3af26c2ff0b7bd7469	2021-11-12 08:20:21 -08:00
Mikayla Gawarecki	5c3a9f3fdc	adding opinfo for torch.nn.bilinear and torch.nn.glu (#67478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67478 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32027807 Pulled By: mikaylagawarecki fbshipit-source-id: 501057cc9aced19fca26c4294fe81dcbb4b83a26	2021-11-12 08:13:15 -08:00
Will Constable	dc24503a89	Fix Hash(c10::Scalar), account for garbage data in union (#68201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201 Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct. Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases. Hash() should only read the bytes corresponding to the currently active type. Test Plan: Added new unit tests. Verified HashTest.Scalar failed with the original Hash() impl and then fixed. Reviewed By: alanwaketan Differential Revision: D32367564 fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b	2021-11-12 07:20:08 -08:00
Andres Suarez	0bd0a67c4f	[lint][fbcode/caffe2] CLANGFORMAT Test Plan: Proof of coverage: ``` $ hg files fbcode/caffe2 \| arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/before.txt $ hg up this_diff $ hg files fbcode/caffe2 \| arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/after.txt $ comm -3 ~/before.txt ~/after.txt \| pastry P467377980: https://www.internalfb.com/intern/paste/P467377980/ ``` These files lost coverage: - `fbcode/caffe2/torch/abi-check.cpp` - `fbcode/caffe2/torch/custom_class.h` - `fbcode/caffe2/torch/custom_class_detail.h` - `fbcode/caffe2/torch/deploy.h` - `fbcode/caffe2/torch/extension.h` - `fbcode/caffe2/torch/library.h` - `fbcode/caffe2/torch/script.h` Everything else in P467377980 gained coverage. Reviewed By: suo Differential Revision: D32364856 fbshipit-source-id: 9b3ba3350ecdb50038412a24af5e0da0fe4d69b8	2021-11-12 05:12:39 -08:00
Charles David Hernandez	e795315c63	Changes and fixes to prepare for dynamic conv (#68175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68175 This slightly alters the way from_float works so it will work with placeholder observers. It also fixes a but with ConvTranspose3d and ConvTranspose1d where the parameters like kernel_size, stride...etc weren't set properly. New tests were added to check for this type of issue as well. Test Plan: python test/test_quantization.py TestQuantizedOps python test/test_quantization.py TestStaticQuantizedModule Imported from OSS Reviewed By: z-a-f Differential Revision: D32374004 fbshipit-source-id: caaa548d12d433d9c1fa0abc8597a7d31bb4e8af	2021-11-11 23:55:04 -08:00
Nikita Shulga	1181628d85	BE: Use TORCH_CHECK instead of explicit c10::Error (#68187 ) Summary: `if (cond) { raise c10::error("", msg)}` is identical to `TORCH_CHECK(!cond, msg);`, but with better attribution Pull Request resolved: https://github.com/pytorch/pytorch/pull/68187 Reviewed By: xuzhao9 Differential Revision: D32360956 Pulled By: malfet fbshipit-source-id: e554b99926d7ad0c79a1cd54d35f47339fa2429d	2021-11-11 22:01:41 -08:00
Shirong Wu	799ebce3aa	Add algo recorder/replayer to lower.py (#68194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68194 Add algorithm recorder/replayer to lower.py Reviewed By: yinghai Differential Revision: D31909575 fbshipit-source-id: 552f2ba4fbd6ea646316f6412d55416a76e1f69a	2021-11-11 21:22:22 -08:00
Mike Ruberry	613c1aca6d	Adds support for automated error and warning testing (#67354 ) Summary: Adds a new class `ErrorOrWarningInput` that is a `SampleInput` with some additional metadata for validating that `SampleInput` throws the desired warning or error. The architecture to support these new tests is modeled after the existing reference tests and sample input functions. Existing invalid input tests for neg and kthvalue are ported to the new scheme to validate it. There may be a simpler/clearer naming scheme we can use here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67354 Reviewed By: jbschlosser Differential Revision: D31989888 Pulled By: mruberry fbshipit-source-id: 4fa816e1e8d0eef21b81c2f80813d42b2c26714e	2021-11-11 19:28:47 -08:00
Yi Zhang	89d556f648	add VS extension in doc (#63944 ) Summary: add VS extension in https://pytorch.org/cppdocs/installing.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/63944 Reviewed By: malfet Differential Revision: D30546156 Pulled By: seemethere fbshipit-source-id: a65448d8702f9fd400c9dd2ef2d9f961f30c4983	2021-11-11 18:02:08 -08:00
Don Jang	9cb65df79f	[Static Runtime] Fallback to disabling manage_output_tensors instead of crashing when wrong API is used (#67939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67939 With `manage_output_tensor` enabled, a client of `StaticRuntime` requires to call it via `PyTorchPredictor::predict_managed_result`. If the client uses `PyTorchPredictor::operator()` the client will experience a crash (intended behavior not to leak memory of managed output tensors). This mistake can cause a catastrophic failure in production if that happens (by gatekeeper, config changes, etc). Considering the complexity in how `PyTorchPredictor` is used in different settings, the chances that this bug can hit production is non-zero. This change introduces `StaticRuntime::disableManageOutputTensor` to disable `manage_output_tensor` feature when a client mistakenly uses `PyTorchPredictor::operator()` instead of crashing. When `StaticRuntime` is invoked via `PyTorchPredictor::operator()`, it first calls `StaticRuntime::disableManageOutputTensor` to disable the feature, so that it can get non-managed output tensors to pass to the client safely. A slight perf degradation is expected by forcefully disabling `manage_output_tensors`, but its robustness value outweighs a catastrophic failure of crashes at a high rate. Test Plan: Added a unittest `StaticRuntime, DisableManageOutputTensors` to cover the newly added code. Reviewed By: swolchok Differential Revision: D32219731 fbshipit-source-id: caf5c910b34726c570e17435ede7d888443e90cf	2021-11-11 17:31:07 -08:00
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Junjie Wang	301369a774	[PyTorch][Fix] Pass the arguments of embedding as named arguments (#67574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67574 When adding the optional params for sharded embedding op. Found that we cannot get these params from `__torch_function__` override. The reason is that we don't pass them via keyword arguments. So maybe we want to change them to kwargs? ghstack-source-id: 143029375 Test Plan: CI Reviewed By: albanD Differential Revision: D32039152 fbshipit-source-id: c7e598e49eddbabff6e11e3f8cb0818f57c839f6	2021-11-11 15:22:10 -08:00
Michael Suo	9571eb599c	[lint] fix up clangtidy lintrunner integration (#68192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68192 - Run on exactly the same stuff as the existing linter checks. - Exclude deploy interpreter headers from being reported. Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32364023 Pulled By: suo fbshipit-source-id: c27eca4a802534875d609d004fa9f6fca59ae6a5	2021-11-11 14:53:28 -08:00
Sameer Deshmukh	6afb414c21	Nan in linalg eig (#67544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61251. As per the comment here (https://github.com/pytorch/pytorch/issues/61251#issuecomment-954676082), a consensus has been reached to raise an error if there is a NaN value in the input when calling `eig()`. This PR implements that feature. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67544 Reviewed By: malfet Differential Revision: D32310919 Pulled By: mruberry fbshipit-source-id: fc74a1ae2d929157c2d4c9051e3e9a4bf03dd5be	2021-11-11 14:33:49 -08:00
Thomas Viehmann	d049772538	Bump dlpack.h to latest version (#65047 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65047 Reviewed By: ngimel Differential Revision: D32039318 Pulled By: mruberry fbshipit-source-id: 7dfc653e1e77799d1f26a95fa9bbae3c7ffc887c	2021-11-11 14:02:16 -08:00
Kurt Mohler	0420545639	Enable all dtype combinations in `torch.Tensor.view(dtype)` (#66493 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29013 Note: This PR does not enable autograd. This can be done in a future PR. cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/66493 Reviewed By: gchanan Differential Revision: D32314680 Pulled By: mruberry fbshipit-source-id: 69d325573b2331f32b83c05c91ffbe80571e7ae2	2021-11-11 13:55:21 -08:00
Nick Anderson	f9ea41f257	Fixes spelling error writeable to writable, improves warning, and documentation (#67664 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46741 pytorchbot contributors: nickleus27, yanivsagy, and khanhthien123 SmrutiSikha this is mostly your work. We just did very minor clean up. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67664 Reviewed By: gchanan Differential Revision: D32311838 Pulled By: mruberry fbshipit-source-id: 0e5d4d888caeccb0fd7c80e6ff11b1b1fa8e00d6	2021-11-11 13:05:00 -08:00
lezcano	1e8f836c44	Remove OpInfo non-contig inputs (#67677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67677 This follows https://github.com/pytorch/pytorch/issues/63341#issuecomment-899690614 Fixes https://github.com/pytorch/pytorch/issues/67012 Note. I wrote the OpInfo for `index_fill`, so removing those inputs in there is right. kshitij12345 mentioned that the same thing is true for the inputs for tile / repeat. https://github.com/pytorch/pytorch/issues/67012#issuecomment-948537446 There are more uses of `transpose` within the OpInfos, but most of them are for testing `mm` and `baddmm`. I did not touch those, as those operations are so important that it won't hurt to test those more thoroughly. cc mruberry Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32311729 Pulled By: mruberry fbshipit-source-id: ac0804ca6f893118046b3e1bd97b5a2e6b900b59	2021-11-11 13:03:16 -08:00
Marco Bertolini	4fe3965b3a	Fix dtype arg typing for Tensor.type doc string (#67019 ) Summary: Fix typing error in PyCharm when using torch.Tensor.type(dtype=torch.int64) <img width="386" alt="Screenshot 2021-10-21 at 15 30 50" src="https://user-images.githubusercontent.com/59562934/138288062-cc2ba45e-ece0-4fca-9369-55d020404c28.png"> Thanks for your great work! :) cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67019 Reviewed By: malfet Differential Revision: D32311313 Pulled By: mruberry fbshipit-source-id: 90fc453bc4129a301d567d4b39137b93c5dac01e	2021-11-11 12:58:46 -08:00
Anirudh Dagar	b07a11929d	Array API: Add torch.linalg.cross (#63285 ) Summary: ### Create `linalg.cross` Fixes https://github.com/pytorch/pytorch/issues/62810 As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (Note: There is no method variant) which is slightly different in behaviour compared to `torch.cross`. Note: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below > linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not. The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273) - [x] Add `torch.linalg.cross` with default `dim=-1` - [x] Add OpInfo and other tests for `torch.linalg.cross` - [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross` - [x] Remove out skip from `torch.cross` OpInfo - [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later) --- ### Additional Fixes to `torch.cross` - [x] Fix Doc for Tensor.cross - [x] Fix torch.cross in `torch/overridres.py` While working on `linalg.cross` I noticed these small issues with `torch.cross` itself. [Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour. > If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected. But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`. To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following. ```python a = torch.randn(3, 4) b = torch.randn(3, 4) b.cross(a) # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true. >>> tensor([[ 0.7171, -1.1059, 0.4162, 1.3026], [ 0.4320, -2.1591, -1.1423, 1.2314], [-0.6034, -1.6592, -0.8016, 1.6467]]) b.cross(a, dim=-1) # this raises as expected since the last dimension doesn't have a 3 >>> RuntimeError: dimension -1 does not have size 3 ``` Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback. cc mruberry Lezcano IvanYashchuk rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285 Reviewed By: gchanan Differential Revision: D32313346 Pulled By: mruberry fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c	2021-11-11 12:49:41 -08:00
Thomas Viehmann	40bedf6206	Fix test_triangular_solve testcase enumeration (#67635 ) Summary: use product instead of zip to cover all cases cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67635 Reviewed By: malfet Differential Revision: D32310956 Pulled By: mruberry fbshipit-source-id: 806c3313e2db26d77199d3145b2d5283b6ca3617	2021-11-11 12:49:38 -08:00
Kurt Mohler	db014b8529	Add `set_deterministic_debug_mode` and `get_deterministic_debug_mode` (#67778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778 Reviewed By: ngimel Differential Revision: D32310661 Pulled By: mruberry fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f	2021-11-11 12:48:29 -08:00
Jiewen Tan	cd4e31ff21	[LTC] Add some comments to BackendDevice() (#68156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68156 [skip ci] Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32346302 Pulled By: alanwaketan fbshipit-source-id: 06de6afbe2f937511abce485b24cec0a85bfbe97	2021-11-11 12:43:56 -08:00
Howard Huang	7b376bf844	Remove ProcessGroup from TensorPipeAgent initialization (#68128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128 Reland of D31762735 (`0cbfd466d2`). This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler. I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls. Test Plan: rpc_pickler_test file: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx rpc_pickler stress test: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results Reviewed By: mrshenli Differential Revision: D32316077 fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4	2021-11-11 12:28:55 -08:00
Michael Suo	b473ca999b	[lint] add cmakelint to lintrunner (#68191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68191 + fix filename of exec_linter Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32364022 Pulled By: suo fbshipit-source-id: 740892d9580edc348c3e818664fd37f145669fda	2021-11-11 12:19:01 -08:00
Alex Beloi	6cade3362b	[fx-acc] add optimize_noop graph opt (#68131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68131 Ports EliminateNoop to FX Adds optimization for a few more ops and cases than the glow version * `acc_ops.dequantize` * `acc_ops.flatten` * `acc_ops.(max\|min)_full_reduce` * `acc_ops.permute` * `acc_ops.reshape` * `acc_ops.squeeze` * `acc_ops.to_dtype` Already covered by either constant fold or custom mapper * acc_ops.slice_tensor * acc_ops.getitem Bug fix * If `-1` is used in reshape's `shape` argument, we would convert this inferred value to actual positive value but needed to use integer division, otherwise we get a float in the shape tuple. Existing unit tests didn't cover this because `unittest.TestCase.assertEqual(1, 1.0)` doesn't check types and returns `True`. Test Plan: # Graph Opt `buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts -- TestEliminateNoOp` ``` Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 95c17eb9-cd4d-463a-96c8-358ca3679d56 Trace available for this run at /tmp/tpx-20211105-144929.801413/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (4.873) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_01_noop_dequantize (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.032) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_02_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_12_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.081) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_15_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.022) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_20_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.126) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_18_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_08_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.034) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_16_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_10_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_06_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.038) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_09_noop_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.055) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_00_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.025) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_04_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_21_noop_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_03_noop_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.040) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_19_noop_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.135) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_11_noop_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.036) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_14_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.024) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_17_noop_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.114) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_13_noop_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.031) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_05_noop_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.026) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_07_noop_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.030) Summary Pass: 22 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775 ``` # Shape Inference `buck test mode/opt //glow/fb/fx/acc_tracer:test_acc_shape_inference` ``` Summary Pass: 99 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599703156114 ``` Reviewed By: jfix71 Differential Revision: D32081046 fbshipit-source-id: 22403f2bb72a2605f1adcbb733e8150795c7984b	2021-11-11 12:08:24 -08:00
Saketh Are	fe90313d02	Avoid index_put_ overhead in histogram kernel's inner loop (#67815 ) Summary: TLDR: Makes torch.histc run 400x faster on large inputs. Should fix [a broken test on internal CI](https://www.internalfb.com/intern/test/281475013640093/). HistogramKernel presently calls torch.Tensor.index_put_ once for each element of its input tensor. Obtaining a data pointer and manipulating it directly avoids the considerable dispatch overhead from calling index_put_. Behavior is unchanged because the tensor being operated on is known to be contiguous and in CPU memory. Fixes performance regression introduced in https://github.com/pytorch/pytorch/pull/65318. Benchmark: time taken to compute histc on a tensor with 10,000,000 elements 1. Before https://github.com/pytorch/pytorch/pull/65318: 0.003s 2. After https://github.com/pytorch/pytorch/pull/65318: 2.154s 3. After this change: 0.005s Benchmark code: ``` import torch as t from timeit import default_timer as timer x = t.randperm(10000000, dtype=t.float32) start = timer() t.histc(x) end = timer() print(end - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67815 Reviewed By: anjali411 Differential Revision: D32357663 Pulled By: saketh-are fbshipit-source-id: f8fa59173ea4772c8ad1332548ef4d9ea8f01178	2021-11-11 11:16:45 -08:00
Kevin Tse	61a94495d9	[DataPipe] adding ZipperMapDataPipe (#68032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68032 Part of #57031 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32263058 Pulled By: NivekT fbshipit-source-id: 13a30ee9d9779284a9fd9bb7222fc41253c6fe3b	2021-11-11 10:36:05 -08:00
Martin Yuan	bd5f33f91e	demo backend decoupled from operators (#66100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100 A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose. Test Plan: Imported from OSS Reviewed By: pavithranrao Differential Revision: D31384614 Pulled By: iseeyuan fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42	2021-11-11 10:26:17 -08:00
Jacob Szwejbka	97a386805e	[Pytorch Edge] Add selective macros to metal ops (#68134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68134 Add the macros in preparation of making these selective. Should be a no-op in this diff. ghstack-source-id: 143023844 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D32326833 fbshipit-source-id: 7abc93102bff0aa0bc5e3383bdf3e95fb84ce5ba	2021-11-11 10:15:31 -08:00
Ivan Yashchuk	c2642b6465	Sparse CSR CPU: add `torch.add` with all inputs sparse (#64391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU. Fixes #59060 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32316562 Pulled By: cpuhrsch fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71	2021-11-11 10:02:12 -08:00
Natalia Gimelshein	84d3df8027	Fast cuda layer norm (#67977 ) Summary: This adds apex-inspired fast layer norm forward kernel to pytorch (it is a significant rewrite though). It's much faster than current implementation, for a typical transformer size (32*196, 1024) time goes down from ~180us to ~49 us on Volta. Compared to apex, it also produces bitwise accurate results between float inputs representable in fp16, and fp16 inputs. It produces slightly different results compared to current implementation though, because welford summation is implemented differently. It is slower than lightSeq (~37 us), but lightseq uses inaccurate variance approximation, and doesn't guarantee float - fp16 bitwise accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67977 Reviewed By: mruberry Differential Revision: D32285331 Pulled By: ngimel fbshipit-source-id: a8b876a9cf3133daacfe0ce3a37e3ad566f4b6a8	2021-11-11 09:32:40 -08:00
eqy	a1ace029e2	Add host-side memory requirement for `test_softmax_64bit_indexing` (#67922 ) Summary: https://github.com/pytorch/pytorch/issues/67910 The original `largeTensorTest` decorator didn't account for the additional host-side memory requirements. Thanks crcrpar for raising the issue, CC ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67922 Reviewed By: malfet Differential Revision: D32308602 Pulled By: mruberry fbshipit-source-id: 97b7d2c39fe63c1a8269402f72186026a89f6b4c	2021-11-11 09:24:15 -08:00
Kushashwa Ravi Shrimali	9e7b314318	OpInfo for `nn.functional.conv1d` (#67747 ) Summary: This PR adds OpInfo for `nn.functional.conv1d`. There is a minor typo fix in the documentation as well. Issue tracker: https://github.com/pytorch/pytorch/issues/54261 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67747 Reviewed By: malfet Differential Revision: D32309258 Pulled By: mruberry fbshipit-source-id: add21911b8ae44413e033e19398f398210737c6c	2021-11-11 09:23:04 -08:00
Vinnam Kim	35f1617001	Implement Entropy methods for Binomial and Multinomial distributions (#67609 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60866. Because it seems https://github.com/pytorch/pytorch/pull/61719 shows no response for a long time, I made this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67609 Reviewed By: malfet Differential Revision: D32310866 Pulled By: mruberry fbshipit-source-id: b3a8dde452f448e5981f5405f5f925f860b0d84f	2021-11-11 09:16:28 -08:00
Ivan Kobzarev	864c6b3794	[nnc] aotCompiler outputSpec support quantized outputs (#67711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67711 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32115833 Pulled By: IvanKobzarev fbshipit-source-id: e96eb72a290ffb88011b86b3c65c0eff864b63dc	2021-11-11 09:01:46 -08:00
Ivan Kobzarev	362c6069b9	[nnc] Lazy lowerings registration; custom classes network params (#67623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67623 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32065076 Pulled By: IvanKobzarev fbshipit-source-id: 4945ac6483938d428c539ed1ce4fcd6988b34250	2021-11-11 09:00:23 -08:00
Vinnam Kim	f89572f417	Add feature: zeros_like() from a dense tensor to a sparse tensor (#68108 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67904. - Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse. cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108 Reviewed By: anjali411 Differential Revision: D32316269 Pulled By: cpuhrsch fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886	2021-11-11 08:54:15 -08:00
Shirong Wu	5efe5e243a	Ease constrain for fuse path in trt lower (#68148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68148 Question raised regarding whether we should fuse pass a->b->c if node a has other consumer rather than node b. This diff is to ease the constrain in fuse path so that in case: ``` a \| \| b d \| c ``` we still allow fuse path(a->b->c), after fuse, node b will be eliminated by dead_node_eliminator while node a keep in graph. Reviewed By: yinghai, 842974287 Differential Revision: D32296266 fbshipit-source-id: 44ded07a97b5b708bdf37193a022fae21410b4bd	2021-11-11 08:48:34 -08:00
Richard Zou	d4ae789655	OpInfos for new_blah functions and some _like functions (#67357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67357 This PR adds OpInfos for: - new_ones, new_zeros, new_full, new_empty - rand_like, randint_like I forgot to add the _like functions in a previous PR, so here they are. Test Plan: - wait for tests Reviewed By: mruberry Differential Revision: D31969533 Pulled By: zou3519 fbshipit-source-id: 236d70d66e82f1d6f8e5254b55ca2a37b54c9494	2021-11-11 07:21:23 -08:00
Vasiliy Kuznetsov	4466ba8f30	Working POC of define-by-run quantization (#64676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676 We implement a working eager mode quantization flow which uses tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization. Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs. Control flow over quantizeable ops is not supported, but general control flow is supported. In particular: * `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function * `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc. * please see `README.md` for more details Test Plan: ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` Differential Revision: D31992281 D31992281 Reviewed By: HDCharles Pulled By: vkuzo fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967	2021-11-11 06:25:24 -08:00
Rohan Varma	f02efc749a	[Dist CI][BE] Run each test in its own process for test_distributed_spawn (#67901 ) Summary: Context: https://github.com/pytorch/pytorch/issues/67061 Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general. Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it. For example test fails, it will print out something like the following to logs - ``` Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class ``` running test_distributed_spawn is still the same cmd as before: ` python test/run_test.py --verbose -i distributed/test_distributed_spawn ` as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901 Reviewed By: cbalioglu, mruberry Differential Revision: D32225172 Pulled By: rohan-varma fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67	2021-11-11 06:11:00 -08:00
Nikolay Korovaiko	aea4e61ec3	skip test_jit_legacy (#68129 ) Summary: disables failing tests in [https://github.com/pytorch/pytorch/issues/66429](https://github.com/pytorch/pytorch/issues/67646) Pull Request resolved: https://github.com/pytorch/pytorch/pull/68129 Reviewed By: suo, janeyx99 Differential Revision: D32326118 Pulled By: Krovatkin fbshipit-source-id: ca00d2214503f418be45dc756057b990fb6e6370	2021-11-10 23:08:32 -08:00
Facebook Community Bot	a6a2616558	Automated submodule update: kineto (#67445 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `f60ad2cb0f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67445 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: robieta Differential Revision: D31993939 fbshipit-source-id: 3d4aa2f900434d4bbe5134db8453deb227ef5685	2021-11-10 22:33:03 -08:00
Chen Lai	a229c3e51a	Add complete type name in error message when fail to export model (#67750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67750 Add more information about why exporting model fails. Before: error message: ``` E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete terminate called after throwing an instance of 'c10::Error' what(): __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). The problematic type is: __torch__.dper3.core.schema_utils.IdListFeature Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first): ``` After ``` E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete terminate called after throwing an instance of 'c10::Error' what(): __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first): ``` ghstack-source-id: 143009294 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D32129397 fbshipit-source-id: 0594a98a59f727dc284acd1c9bebcd7589ee7cbb	2021-11-10 21:04:05 -08:00
Mike Iovine	1f07efd0f2	[SR] Fix aten::split schema (#68135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68135 Update the schema to reflect the changes in D31935573 (`6b44e75f6b`). Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Confirmed native implementation is used. Reviewed By: hlu1 Differential Revision: D32326865 fbshipit-source-id: 7f607f57ceb6690a2782d94d9ee736ba64e7d242	2021-11-10 20:03:30 -08:00
Hao Lu	47bc47f2b9	[SR] Add runtime check to correct bad schema alias info (#67825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67825 The comment explains how it works. Test Plan: A small regression to local and local_ro if we only enable it for fallback ops. ``` ## local_ro # before I1103 21:25:05.250440 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22213. Iters per second: 818.247 I1103 21:25:08.629221 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22351. Iters per second: 817.319 I1103 21:25:12.005179 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22285. Iters per second: 817.759 I1103 21:25:12.005236 2636751 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22283, standard deviation: 0.000693619 # after # # only enable for fall back ops: 0.7% I1103 21:26:40.190436 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22928. Iters per second: 813.481 I1103 21:26:43.590443 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23265. Iters per second: 811.262 I1103 21:26:46.992928 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23379. Iters per second: 810.51 I1103 21:26:46.992980 2644597 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.23191, standard deviation: 0.0023424 # enable for all (no clone): 4.7% I1103 21:27:55.291216 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.28204. Iters per second: 780.005 I1103 21:27:58.822347 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27854. Iters per second: 782.14 I1103 21:28:02.354184 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27958. Iters per second: 781.506 I1103 21:28:02.354240 2649780 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.28006, standard deviation: 0.00179765 # local # before I1103 21:52:00.784718 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.676. Iters per second: 50.8233 I1103 21:52:28.985873 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.699. Iters per second: 50.7641 I1103 21:52:57.200223 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6953. Iters per second: 50.7735 I1103 21:52:57.200273 2765168 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.6901, standard deviation: 0.0123206 # after # # only enable for fall back ops: 0.1% I1103 21:45:25.514535 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7103. Iters per second: 50.7349 I1103 21:45:53.773594 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7005. Iters per second: 50.7601 I1103 21:46:21.955680 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7398. Iters per second: 50.659 I1103 21:46:21.955729 2734440 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.7169, standard deviation: 0.0204658 # enable for all (no clone): 0.9% I1103 21:43:22.162272 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8893. Iters per second: 50.2783 I1103 21:43:50.651847 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8566. Iters per second: 50.3611 I1103 21:44:19.068519 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8793. Iters per second: 50.3037 I1103 21:44:19.068570 2723868 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.875, standard deviation: 0.0167498 ``` Reviewed By: d1jang Differential Revision: D32124812 fbshipit-source-id: 0f60c26f8fb338d347e4ca7a70b23e5a386fc9aa	2021-11-10 19:35:11 -08:00
Dhruv Matani	ca7d0062ad	[PyTorch Edge] Better error message when training attribute is not found (#68103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68103 The error message `'training' attribute not found.` in itself isn't particularly actionable. Anyone running into this tends to be clueless regarding why they are getting this message. For example, see [this post](https://fb.workplace.com/groups/pytorch.edge.users/posts/965868874283406/) asking for help when seeing this specific error message. The most common reason for this error is that users call `.eval()` in the model instance before saving it. This change tries to draw attention to that oversight and allows them to proactively investigate and correct that mis-action if necessary. This saves valuable time for our users and effort from the team tp provide support. Overall, I believe this is a Developer Experience win. ghstack-source-id: 143021300 Test Plan: Build/CI Reviewed By: JacobSzwejbka Differential Revision: D32304477 fbshipit-source-id: 474abe717a862347f16ad981834ddab6819cb4d3	2021-11-10 19:31:10 -08:00
Nikita Shulga	0e366b8e5f	Make `torch.fx.experimental.fx2trt.passes` a package (#68139 ) Summary: Only packages and tools (which are explicitly specified) are included in the wheel/conda files Pull Request resolved: https://github.com/pytorch/pytorch/pull/68139 Test Plan: Run `python3 -c "from setuptools import find_packages; print([x for x in find_packages(exclude=('tools','tools.*')) if 'torch.fx' in x])"` before and after the change Fixes https://github.com/pytorch/pytorch/issues/68059 Reviewed By: nrsatish, seemethere Differential Revision: D32330483 Pulled By: malfet fbshipit-source-id: a55443730999a83c615b3f943c327353c011bf7b	2021-11-10 15:57:29 -08:00
Dani El-Ayyass	f171c78c04	add unpack_sequence and unpad_sequence functions (#66550 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66550 Reviewed By: malfet Differential Revision: D32299193 Pulled By: jbschlosser fbshipit-source-id: 96c92d73d3d40b7424778b2365e0c8bb1ae56cfb	2021-11-10 15:15:08 -08:00
Shirong Wu	a510f4139b	Fix lambda function broke torch.save Summary: Torch.save use pickle, which cannot handle lambda function or local function directly without modify serialization.py. This diff fix the issue by extract lambda to a normal function. Test Plan: buck test mode/dev-nosan //caffe2/test/fx2trt/core:test_trt_module Reviewed By: 842974287 Differential Revision: D32320536 fbshipit-source-id: 497d2e64f94526f92e6d1a9909b6ad629dbca850	2021-11-10 14:21:06 -08:00
soulitzer	22e73f616c	Update unpack_dual to return named tuple (#68062 ) Summary: Also updates the doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/68062 Reviewed By: gchanan Differential Revision: D32315089 Pulled By: soulitzer fbshipit-source-id: 567c812da093daeb6549b0dc7ecbffd58eb8ccc2	2021-11-10 14:14:55 -08:00
Will Constable	d6e6064efc	[LT] Upstream backend interfaces (#67927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927 BackendData - represents 'tensor data' in opaque backend storage LoweringContext - interface for performing backend-specific IR lowering BackendImplInterface - interface for lazy tensors backends to implement Reorgs backend-related files into lazy/backend subdir includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master. Test Plan: used by lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D32142032 fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5	2021-11-10 12:55:31 -08:00
Howard Huang	c075f0f633	Update rpc testing to include USE_TENSORPIPE directive (#68080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68080 Fixes #68002 After FaultyProcessGroupAgent was replaced with FaultyTensorpipeAgent there is now a dependency on Tensorpipe for rpc testing. However, if user does not have USE_TENSORPIPE enabled they will hit an issue such `undeclared identifier 'FaultyTensorPipeRpcBackendOptions'`. This is for testing the faulty agent method so it should not block compilation. Update to wrap the Tensorpipe specific code in a directive. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32292861 Pulled By: H-Huang fbshipit-source-id: 4ffb879860ced897674728200a1831f18fea0a4a	2021-11-10 12:12:18 -08:00
Eli Uriegas	a3bb95c1b5	don't include label in ci: sev issue (#68093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68093 We don't want regular users without write access to be able to file an actual issue with the `ci: sev` label since that issue will automatically show up on hud.pytorch.org Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32299553 Pulled By: seemethere fbshipit-source-id: d46a96f16ae29120fff94288d3e0c06b103edf7f	2021-11-10 12:03:18 -08:00
Mike Iovine	ecd5b1a8d4	[SR] Native implementation for aten::split (#67476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67476 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31994040 fbshipit-source-id: 9de57d8d7925ee46544478eae8229952ca5f248a	2021-11-10 10:23:03 -08:00
Samuel Salas	746a31b290	Logger integration format (#67962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67962 Logger integration format for chunks at [dims] -> input_val.shape[dim] NOTE: Unused typing imports removed Test Plan: buck run -c python.package_style=inplace mode/dev-nosan caffe2/torch/fb/fx2trt:test_chunk out: RuntimeWarning: Asked for 2000 chunks along dimention 2 on tensor with size (3, 10, 20), chunks will default to 20 Reviewed By: 842974287 Differential Revision: D32233039 fbshipit-source-id: 1fde12c9f743bb80cdb309e0b7be287173d45147	2021-11-10 10:12:06 -08:00
Natalia Gimelshein	8dfbc620d4	don't hardcode mask type in mha (#68077 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68077 Reviewed By: zou3519 Differential Revision: D32292410 Pulled By: ngimel fbshipit-source-id: 67213cf5474dc3f83e90e28cf5a823abb683a6f9	2021-11-10 09:41:21 -08:00
George Qi	ae5864498d	torch.allclose opinfo (#68023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68023 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32295811 Pulled By: george-qi fbshipit-source-id: 3253104a5a9655d8ba7bbba6620038ed6d6669f1	2021-11-10 09:16:39 -08:00
Joel Schlosser	9a2db6f091	Factor backend routing logic out of convolution forward (#67790 ) Summary: This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params. The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this). A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet: * nnpack (for mobile) * xnnpack (for mobile) * winograd 3x3 (for mobile) Some flowcharts for reference: ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790 Reviewed By: zou3519 Differential Revision: D32280878 Pulled By: jbschlosser fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb	2021-11-10 07:53:55 -08:00
vfdev-5	147de8243b	Fixed deprection warnings with `.data<T>()` in SpectalOps.cpp (#67993 ) Summary: Description: - Fixed deprection warnings `.data<T>()` -> `.data_ptr<T>()` in SpectralOps.cpp shown while building pytorch from source ```c++ ../aten/src/ATen/native/mkl/SpectralOps.cpp:213:10: warning: ‘T* at::Tensor::data() const [with T = c10::complex<double>]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor. data_ptr<T>() instead. [-Wdeprecated-declarations] 213 \| return reinterpret_cast<std::complex<T>*>(t.data<c10::complex<T>>()); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67993 Reviewed By: H-Huang Differential Revision: D32246945 Pulled By: mruberry fbshipit-source-id: 5cd6b0ac6ddff0afc56e99641971e1e3b6434af6	2021-11-10 07:33:15 -08:00
Jiewen Tan	6011c35a79	[LTC] Upstream class BackendDevice (#68027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027 This commit upstreams class BackendDevice to the master, which is a backend specific representation of the actual hardware, for instances, CPU, GPU, or TPU. This concept is important for backend like XLA where it needs to tell the actual hardware type from the c10::DeviceType::Lazy virtual device during both IR constructions and lowerings. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* Reviewed By: wconstab Differential Revision: D32261838 Pulled By: alanwaketan fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc	2021-11-10 07:05:43 -08:00
Alban Desmaison	a6c0edff1a	fix gradcheck to generate valid input for forward AD complex (#68001 ) Summary: This fixed a few of the linalg checks that we disabled before! This also seems to break sgn, abs and angle (sending on CI here to see if there are more). These two functions used to only ever get pure imaginary or real values. This is very much likely that something is wrong with their formula. But they are implemented as element-wise, so not sure where the error can come from. I tried to look at it but nothing obvious seems wrong there (especially because it is correct in backward mode). Pull Request resolved: https://github.com/pytorch/pytorch/pull/68001 Reviewed By: soulitzer Differential Revision: D32280475 Pulled By: albanD fbshipit-source-id: e68b1ce0e2e97f8917c3d393141d649a7669aa9d	2021-11-10 03:07:48 -08:00
oliver	94b6fa6f8b	Adds an optimizer instance variable to ChainedScheduler (#68010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67601. As simple a fix as I could make it. I even managed to delete some testing code! I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later. As it stands, https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010 Reviewed By: zou3519 Differential Revision: D32278139 Pulled By: albanD fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579	2021-11-10 01:31:47 -08:00
Dhruv Matani	cb2a41e508	[PyTorch Edge] Don't use LeftRight in mobile (#66064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66064 The only place this is used seems to be in the dispatcher for `operatorLookupTable_`. Disarming `LeftRight` disarms it for this one use case. This should make .so loading faster, and also reduce memory consumption since `LeftRight<T>` does 2 writes for every write. I'd like to get a thorough review from reviewers for this diff since I want to make sure that initialization of stuff that writes into the dispatcher isn't going to happen on multiple threads for on-device use. Created a new class named `LeftRightNoOpWrapper<T>` for use in mobile builds. ### Why is LeftRight<T> slow? It maintains 2 copies of each data structure `T` to be able to keep reads quick. Every write goes to both data structures, which means that writes that 2x and memory overhead is also 2x ### Why is this safe for mobile builds? 1. .so loading never happens concurrently with model execution 2. Custom ops are loaded during .so load - initializers are all run serially 3. I don't see any threads being spawned from the global schema and kernel initializers After discussing with dreiss, it seems like there could be rare cases in OSS apps or internal Android/iOS apps where a `.so` or `dylib` is loaded after the PT runtime is loaded, and this load happens concurrently with an in-progress inference run, which is looking up the operator table in the dispatcher. To avoid crashes there, it seems reasonable to use the RW lock, since I don't expect any contention 99.9% of the time. When registering operators, everything is serial so only one thread will ever hold the lock. The next time it needs the lock, it will have already released it. During inference runs, only one thread will ask for the shared lock unless multiple concurrent inferences are in progress. Even in that case, they will all be able to simultaneously get the Read lock. Test Plan: Build and generate a local build of the iOS app to test. Reviewed By: swolchok Differential Revision: D31352346 fbshipit-source-id: c3f12454de3dbd7b421a6057d561e9373ef5bf98	2021-11-09 21:49:45 -08:00
Dhruv Matani	b0817e19e0	[PyTorch] Avoid reading file from stream for 0 byte Tensor storage (#67787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67787 First noticed in https://fb.workplace.com/groups/pytorch.edge.team/posts/952737705280969/ - basically one of the speech models has ~400 0 byte tensor files, so we're basically paying the cost of looking it up in the archive and reading nothing from it. Turns out that there's a fairly simple fix to avoid reading a 0 byte tensor. Once we notice that it's 0 bytes, just use the default `DataPtr` instead to initializing it with 0 bytes read in from the input file stream. ghstack-source-id: 142025211 Test Plan: CI and manually ran a couple production mobile models with bundled inputs. CI Will run all prod. mobile mobiles with bundled inputs. Reviewed By: swolchok Differential Revision: D32054983 fbshipit-source-id: 919b0cdbc44bccb8f6cfe0da10ff5474af37fd99	2021-11-09 21:45:05 -08:00
Dhruv Matani	bf31d4b2b5	[PyTorch] Replace copy_ with data_ptr<float>() since input Tensor's dtype is guaranteed to be float (#67788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67788 Based on comments from supriyar in D31657430 (`20aa417e38`). ghstack-source-id: 142924000 Test Plan: CI Reviewed By: supriyar Differential Revision: D32055028 fbshipit-source-id: 756d526585f8ded755ea42b52dbbf5c1687acde2	2021-11-09 21:40:23 -08:00
Elias Ellison	6b44e75f6b	aliasing fixes (#66977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66977 Fix for https://github.com/pytorch/pytorch/issues/47218 More context is in original PR here: https://github.com/pytorch/pytorch/pull/20556 Test Plan: Imported from OSS Reviewed By: malfet, albanD Differential Revision: D31935573 Pulled By: eellison fbshipit-source-id: 3658d5711116396c35f1d5016773b0096ed347a5	2021-11-09 18:33:37 -08:00
Shirong Wu	3f1a3f7b18	Fix ads dense arch regression (#68071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68071 Reviewed By: yinghai Differential Revision: D32261611 fbshipit-source-id: 3224464bbf30fecbdb69e6ae88e42485ef67f800	2021-11-09 18:22:01 -08:00
Natalia Gimelshein	91af74c934	remove Generate* macro files (#67940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67940 Reviewed By: mruberry Differential Revision: D32250987 Pulled By: ngimel fbshipit-source-id: 3feb0bc876bc26d0a42784e5c6001670ed71e971	2021-11-09 17:31:56 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Jiakai Liu	078c655985	[nnc][mobile] temporarily disable quantization external functions (#68029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68029 Temporarily disable quantization external functions with a new macro DISABLE_NNC_QUANTIZATION. The ATen CPU library consists of two parts: A. Common operator functions, e.g. "at::empty()", the list of sources can be found at "aten_cpu_source_list" in "tools/build_variables.bzl"; B. Implementations of these operators, e.g. "at::native::empty()", the list of sources is defined at "aten_native_source_list" in "tools/build_variables.bzl"; Note that A does not directly depend on B. A calls B via dispatch table. The dependency is injected into the dispatch table by B during its static initialization. For internal mobile builds, B is built on a per-app basis. A is the public library for other libraries to depend on. Because these external functions call quantization functions that are not part of A, the NNC kernel library cannot resolve the missing symbols. Use this PR to unblock the internal experiment until we figure out a better solution (e.g. move quantization API to A). ghstack-source-id: 142868370 Test Plan: Make sure it can build with the stacked diff. Reviewed By: IvanKobzarev Differential Revision: D32239783 fbshipit-source-id: 3797b14104b0f54fb527bc3fc5be7f09cc93d9e4	2021-11-09 17:10:16 -08:00
Dmitry	b1a42298a4	Simplify example for nn.Flatten (#67472 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67415 Using the docstring example provided by jbschlosser to the issue submitted by qzylalala Pull Request resolved: https://github.com/pytorch/pytorch/pull/67472 Reviewed By: soulitzer Differential Revision: D32210995 Pulled By: jbschlosser fbshipit-source-id: f22bcd729699993942b6e676b479618ac613022c	2021-11-09 17:03:06 -08:00
Eli Uriegas	d8f0087e08	.github: Fix sccache for macOS workflows on push (#68094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68094 Turns out sccache was not getting activated properly on master pushes so this should help resolve that Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32299636 Pulled By: seemethere fbshipit-source-id: 5f1be98dffdb202d3c11b6ceb2b49af235e1f91b	2021-11-09 16:40:56 -08:00
Hao Lu	1b2a366932	[SR] Enforce checks for resizing of the internal buffer in MemoryPlanner in unit tests (#67941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67941 I just found out that due to the round up of the Tensor storage sizes to multiples of 64 bytes, resizing is not actually triggered for a lot of our unit tests (23 OSS, 16 internal). Now they should be all fixed. Also moved a bunch of tests to `test_static_module.cc` so that `test_static_runtime.cc` now only contains operator tests. From now on, by default if `args2` is passed to `test_static_runtime`, at the end of the second iteration, it would check that the managed buffer's size is bigger than the previous size and enforce that. You can bypass the check for ops with constant output sizes, such as `aten::sum` without `dim` passed in. Test Plan: Facebook ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` Reviewed By: swolchok Differential Revision: D32196204 fbshipit-source-id: 8425d9efe6b9a1c1e3807e576b1143efd7561c71	2021-11-09 16:07:40 -08:00
Eli Uriegas	8d025bbc2d	.github: Migrate macOS workflows to GHA (#67717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67717 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32287733 Pulled By: seemethere fbshipit-source-id: 8df6b20aada818ad39895ef87dc280098e09707b	2021-11-09 15:46:05 -08:00
Jacob Szwejbka	55e3b23abe	[Pytorch Edge] Generic Build Features for Selective Build (#67817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67817 Implementation of build features as a useable feature. Includes tracing support and selectivity support. Follow up of Dhruv's prototype in D30076214. The general idea is to allow selectivity of arbitrary sections of the codebase through the 2 apis, BUILD_FEATURE_REQUIRED(NAME), and BUILD_FEATURE_AVAILABLE(NAME) References PyTorch Edge Team Workplace group post link: https://fb.workplace.com/groups/pytorch.edge.team/posts/905584476662959/ Quip talking about some early ideas related to build features: https://fb.quip.com/iur3ApU9q29v Google Doc about most recent discussion and details: https://docs.google.com/document/d/1533zuN_9pwpQBa4RhtstUjT5B7guowblqJz35QYWPE0/edit Will remove the copy kernel example after. Its just here as an example. ghstack-source-id: 142850218 Test Plan: CI, dummy traced a model, and played around with its unit test if i removed the traced value from the yaml Reviewed By: dhruvbird Differential Revision: D32151856 fbshipit-source-id: 33764c1f6902a025e53807b784792a83c8385984	2021-11-09 15:37:21 -08:00
Kushashwa Ravi Shrimali	43ef6816f2	OpInfo for `nn.functional.cross_entropy` (#63547 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261 TODOs: * [ ] Investigate autograd failures. * [ ] Clean up `test_nn.py` for `cross_entropy`. cc: mruberry zou3519 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63547 Reviewed By: mruberry Differential Revision: D32062955 Pulled By: zou3519 fbshipit-source-id: 2a62a4c28af51fb71159df2e262d05039d549b7e	2021-11-09 15:07:12 -08:00
James Reed	eaf0457eef	[distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68068 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D32286554 Pulled By: jamesr66a fbshipit-source-id: a43fe1f0cfa74721f467b128f2e878bd02f32546	2021-11-09 15:01:54 -08:00
Brian Hirsh	7c90bd77ec	Test functionalization pass in python (#66101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101 Updated description: This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it: - runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases - runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops. Mechanically, the changes include: - factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests - adding some private python api's in the `torch` namespace as hooks that I can use during testing In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31942095 Pulled By: bdhirsh fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707	2021-11-09 14:34:05 -08:00
Brian Hirsh	fe46d6c68f	functionalization: map copy_() -> to().expand_as() (#67878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67878 The functionalization pass doesn't work with `copy_()` which is a problem with functorch. Originally we were going to make a functional `copy()` operator to fix this problem, but zou3519 that we can get (most of) the existing functionality by mapping `self.copy_(src)` to `src.to(self).expand_as(self)`. This makes the codegen a bit uglier, but has the benefit of avoiding a totally unnecessary tensor allocation in functorch. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32280588 Pulled By: bdhirsh fbshipit-source-id: 2c6ee65f0929e0846566987183ba2498c88496c2	2021-11-09 14:34:02 -08:00
Brian Hirsh	be4150139a	bugfix for conditional functionalization (#67715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67715 I had original made the `vector<ViewMeta>` and `Tensor`s stored on the `Update` struct references, but will pointed out a bug in the conditional-functionalization PR due to a use-after-free error. This happens because the queued-up updates might not be synced until later, and can out-live the original tensor that was used to create them. It was kind of strange that this doesn't show up in the existing `test/test_functionalization.py` tests that I have in this stack, which technically also should have this bug (they call sync_() after the mutated tensors have gone out of scope). I looked at it with gdb, and I'm wondering if it's just because the stored values in the free'd `ViewMeta`/`Tensor` just happen to not get clobbered by the time the sync is called in the test. Either way, copying the Tensor + vector<ViewMeta> is probably not ideal for performance, but I couldn't think of an easy work-around for now. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32136007 Pulled By: bdhirsh fbshipit-source-id: 707c6392a31b967e8965b9b77f297fd10a0a095a	2021-11-09 14:32:17 -08:00
Jane Xu	4100a5cc48	Revert D32286934: [pytorch][PR] replace platform specific CI environment variables with generic ones Test Plan: revert-hammer Differential Revision: D32286934 (`7d931fb082`) Original commit changeset: 1008938088da fbshipit-source-id: dd2dd07742670a34deec10995b95b98c9fd62724	2021-11-09 14:06:18 -08:00
Xiaoyu Zhang	273f7ae9b3	fx: Update fx.rst (#68043 ) Summary: When I run this part of the code on the document with PyTorch version 1.10.0, I found some differences between the output and the document, as follows: ```python import torch import torch.fx as fx class M(torch.nn.Module): def forward(self, x, y): return x + y # Create an instance of `M` m = M() traced = fx.symbolic_trace(m) print(traced) print(traced.graph) traced.graph.print_tabular() ``` I get the result： ```shell def forward(self, x, y): add = x + y; x = y = None return add graph(): %x : [#users=1] = placeholder[target=x] %y : [#users=1] = placeholder[target=y] %add : [#users=1] = call_function[target=operator.add](args = (%x, %y), kwargs = {}) return add opcode name target args kwargs ------------- ------ ----------------------- ------ -------- placeholder x x () {} placeholder y y () {} call_function add <built-in function add> (x, y) {} output output output (add,) {} ``` This pr modified the document。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68043 Reviewed By: driazati Differential Revision: D32287178 Pulled By: jamesr66a fbshipit-source-id: 48ebd0e6c09940be9950cd57ba0c03274a849be5	2021-11-09 14:00:45 -08:00
Yifan Xiong	c7eaec86f0	[NCCL] Patch bfloat16 support (#67843 ) Summary: Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is still not complete to enable bfloat16 for allreduce in end-to-end training. This patch does the followings: * fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in v2.10.3-1 (commit 7e51592) * update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL operations like all reduce can use it * enable unit tests for bfloat16 datatype if possible cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843 Reviewed By: H-Huang Differential Revision: D32248132 Pulled By: mrshenli fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525	2021-11-09 13:46:13 -08:00
Ben Koopman	45ac6f2b65	[quant] Fix comparison against reference for test_qat_functional_linear (#68061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68061 Test had a typo that didn't compare test value against reference value, fixed typo. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_functional_linear"` Imported from OSS Reviewed By: HDCharles Differential Revision: D32280803 fbshipit-source-id: d57a25a0dcdd88df887a39b5117abafaf15125b2	2021-11-09 13:33:13 -08:00
John Clow	a9c2f11d2a	Update Freezing Logic and add new passes (#68024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024 Pull Request resolved: #67949 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32260614 Pulled By: eellison fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43	2021-11-09 13:21:52 -08:00
John Shen	d2438a8901	[qnnpack] Lock before weightpacking in qlinear (#68012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68012 Previous attempt to make qlinear threadsafe placed lock after weight ptr was already accessed via packB. Race condition occurs when thread1 acquires lock, packs weights but thread2 still uses old nullptr after acquiring the lock. This causes a null pointer dereference later. ghstack-source-id: 142714894 Test Plan: Tested on repro diff Reviewed By: kimishpatel Differential Revision: D32252563 fbshipit-source-id: 429fcd3f76193f1c4c8081608b6f725b19562230	2021-11-09 13:03:02 -08:00
Samantha Andow	e86058559a	Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492 Reviewed By: zou3519 Differential Revision: D32282580 Pulled By: samdow fbshipit-source-id: 115afe790328577357a90117bede3b6502590441	2021-11-09 12:57:38 -08:00
Michael Suo	726e2ed715	[lint] add more lints to lintrunner (#68069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68069 - executable bit - cub include - raw CUDA API usage Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32286559 Pulled By: suo fbshipit-source-id: 21d58e259c951424f9c6cbf1dac6d79fe7236aa4	2021-11-09 12:48:56 -08:00
Ivan Yashchuk	cbf596bf8e	Sparse CSR CPU: add `addmv_out` (#61536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536 This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated. MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`. MKL Sparse doesn't allow switching indices type in runtime, it's predetermined in build time. Only 32-bit version of MKL was tested locally, but I expect 64-bit version to work correctly as well. When indices type of PyTorch CSR tensor doesn't match with MKL's, indices tensor is converted to MKL compatible type (`int` vs `int64_t`). cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32141787 Pulled By: malfet fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6	2021-11-09 12:34:21 -08:00
Andrey Talman	7d931fb082	replace platform specific CI environment variables with generic ones (#68022 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68022 Reviewed By: seemethere Differential Revision: D32286934 Pulled By: atalman fbshipit-source-id: 1008938088da56807e85fb5d776abf79f28ef77b	2021-11-09 12:06:44 -08:00
Bin Bao	a027551358	[LT] Merge cache.h (#67929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929 1. Write a node-hash based unit test for Cache 2. Replace CHECK with TORCH_CHECK in IrUtil Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32246134 Pulled By: desertfire fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0	2021-11-09 12:02:02 -08:00
Bin Bao	a473417076	[LT] Merge permutation_util into master (#67766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab Differential Revision: D32147676 Pulled By: desertfire fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76	2021-11-09 12:00:39 -08:00
Onyiee	442d7d72de	fixed type checking errors in options.py (#68056 ) Summary: Fixes [issue#64](https://github.com/MLH-Fellowship/pyre-check/issues/64) This PR fixes the type checking errors in torch/distributed/rpc/options.py. The variable types in 84:8 and 85:8 were declared to have type `List` but were sometimes assigned a value of `None`. This caused an incompatitble variable type error. Therefore, I changed the type from `List` to `Optional[List]` . Hence, this fixes the incompatitble variable type error. Signed-off-by: Onyemowo Agbo onionymous 0xedward Pull Request resolved: https://github.com/pytorch/pytorch/pull/68056 Reviewed By: zou3519 Differential Revision: D32282289 Pulled By: mrshenli fbshipit-source-id: ee410165e623834b4f5f3da8d44bd5a29306daae	2021-11-09 11:42:34 -08:00
Kevin Tse	acb035f513	Revert D31609714: Fix Dispatching not considering List[Optional[Tensor]] for dispatch Test Plan: revert-hammer Differential Revision: D31609714 (`c581f56c74`) Original commit changeset: bb91cafd32fb fbshipit-source-id: a04055e7af4bf8491b44bbc3e3bddc7831ab205e	2021-11-09 10:41:53 -08:00
Mike Iovine	6e53d6df83	[SR] Introduce StaticMethod (#67981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67981 To save on memory, various internal classes need to release all references to their `torch::jit::Module` after constructing their `StaticModule`. Unfortunately, many of these classes currently instantiate a `torch::jit::Method` attribute, which holds a reference to the `ivalue` backing its owning module. To avoid this, I've introduced a new subclass of `IMethod` to represent scripted functions backed by static runtime. Test Plan: CI Reviewed By: swolchok Differential Revision: D32232039 fbshipit-source-id: 434b3a1a4b893b2c4e6cacbee60fa48bd33b5722	2021-11-09 10:37:29 -08:00
Mike Iovine	5e19fb61fd	[SR] Release reference to JIT module if possible (#67911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67911 If we can remove `self` from the graph inputs, there is no need for `StaticModule` to hold onto its `Module` reference anymore. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32190755 fbshipit-source-id: 9c4649a63b6e68c7d2e47395a23572985d2babb1	2021-11-09 10:35:31 -08:00
Pritam Damania	9ae3f3945b	Add remote_module logging to the __new__ method. (#68035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68035 RemoteModule is sometimes created using object.__new__ (ex: init_from_module_rref), in this case the logging in the __init__ method would not pick this up. As a result, adding a `__new__` method to RemoteModule to log all usages appropriately. ghstack-source-id: 142762019 Test Plan: waitforbuildbot Reviewed By: vipannalla Differential Revision: D32263978 fbshipit-source-id: a95ab0bb5d0836da8fe6333c41593af164b008d9	2021-11-09 09:32:34 -08:00
Peter Bell	96b4f2296e	CppSignature: Compare types by their mangled names (#67987 ) Summary: `.name()` has to call `__cxa_demangle` and allocate a new string, both of which can be avoided by just comparing the mangled names directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67987 Reviewed By: mruberry Differential Revision: D32264560 Pulled By: H-Huang fbshipit-source-id: 9dd4388ba4e2648c92e4062dafe6d8dc3ea6484e	2021-11-09 08:52:42 -08:00
JackCaoG	114ef8c5ea	Add SiLU backward Aten symbol (#67665 ) Summary: This is related to https://github.com/pytorch/xla/issues/3192. bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/67665 Reviewed By: desertfire Differential Revision: D32245736 Pulled By: bdhirsh fbshipit-source-id: c5a2b24214fa37a181246cbbfcbee131473cf807	2021-11-09 08:14:02 -08:00
Richard Zou	c581f56c74	Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#66506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66506 Followup to https://github.com/pytorch/pytorch/pull/60787 It turns out that the original PR was wrong for unboxed kernels. We recently ran into this in https://github.com/facebookresearch/functorch/issues/124 For unboxed kernels, the correct type for a Tensor?[] argument is actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>` Test Plan: - assert that https://github.com/facebookresearch/functorch/issues/124 actually works Reviewed By: bdhirsh Differential Revision: D31609714 Pulled By: zou3519 fbshipit-source-id: bb91cafd32fb3c1b7d1e4f966b46b5d973b50df2	2021-11-09 08:00:09 -08:00
Kevin Tse	803e88d418	[DataPipe] Fixing pickling issues with fork and demux (#67930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67930 Fixes #67848 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32222184 Pulled By: NivekT fbshipit-source-id: 48871c45a855d92cd599e21f3b53827dd32c91ef	2021-11-09 07:54:02 -08:00
Joao Gomes	577a4d34a7	making import_module private and deprecating public method (#67990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67990 Duplicate of the following PR which was merged by mistake without ghimport https://github.com/pytorch/pytorch/pull/67914 cc albanD NicolasHug Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32247560 Pulled By: jdsgomes fbshipit-source-id: 8ba5ba7d17fc3d0d2c377da467ea805822e21ec1	2021-11-09 07:27:57 -08:00
Vasilis Vryniotis	0a9cd6d461	Removes unnecessary `no_pretrained_model` from test_quantize_fx.py (#67836 ) Summary: TorchVision accidentally included model builders for quantized models without weights; this was an old bug. These builders were largely unusable and caused issues to the users. Commonly they were filtered out to avoid causing issues. We've recently fixed that (https://github.com/pytorch/vision/pull/4854) by either removing those unnecessary builders or by providing quantized weights. This PR removes the no-longer necessary filtering of the methods. It should be merged after TorchVision is synced on FBCode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67836 Reviewed By: jerryzh168 Differential Revision: D32230658 Pulled By: datumbox fbshipit-source-id: 01cd425b1bda3b4591a25840593b3b5dde3a0f12	2021-11-09 05:49:27 -08:00
Alban Desmaison	f9422e1c6b	Fix deadlock for multi-output forward AD (#67995 ) Summary: Will hide some of the issues from https://github.com/pytorch/pytorch/issues/67367 This will at least allow us to run gradcheck for now until the above issue is fixed. For more context, the deadlock happens when we (wrongfully) set a forward grad that also has a forward grad of the same level. In particular, when exiting the level from `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L23)` We are taking the `all_forward_levels_mutex_` lock and proceed to delete the level at `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L29)` (nothing else usually references this object, so it gets deleted as soon as it gets removed from the vector). Note that, at this point, we still have the lock! In the level destructor in `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L55)` we are deleting the forward grad. Which triggers the deletion the grad Tensor and everything it holds (assuming nothing else references it). But in the (bad) case where this Tensor also has a forward grad for this level, the autograd meta clears the fw grads: `191b48b12f/torch/csrc/autograd/forward_grad.h (L124)` While clearing, we access the level (to de-register this forward grad) via `191b48b12f/torch/csrc/autograd/forward_grad.h (L139)` But this tries to access the level again in `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L39)` and deadlocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67995 Reviewed By: soulitzer Differential Revision: D32250996 Pulled By: albanD fbshipit-source-id: f6118117effd3114fa90dc8fe22865339445f70c	2021-11-09 01:32:43 -08:00
oliver	f8297d40fc	Adds a `maximize` flag to SGD. (#67847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD. ## Notes: - I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function. - This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy. ## Work to be done: [] I need to update the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847 Reviewed By: H-Huang Differential Revision: D32252631 Pulled By: albanD fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4	2021-11-09 00:43:07 -08:00
Masaki Kozuki	c5e5264be2	Disable TF32 in `pinv_jvp` and `pinv_backward` (#67948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67947 cc ptrblck xwang233 zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/67948 Reviewed By: H-Huang Differential Revision: D32251934 Pulled By: ngimel fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec	2021-11-08 22:33:29 -08:00
Natalia Gimelshein	417dc7f86c	Revert D32007691: [pytorch][PR] Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) Test Plan: revert-hammer Differential Revision: D32007691 (`ea60e7d559`) Original commit changeset: 6cb14dc56e29 fbshipit-source-id: 9ef599ef07302fb521b1f413b989786adfa3576c	2021-11-08 21:16:53 -08:00
Jane Xu	36d9a74bc6	Enforce that test cases extend from correct TestCase (#67819 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/66903 Main code is in torch/testing/_internal/common_utils.py and everything else is fixing the lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/67819 Reviewed By: H-Huang Differential Revision: D32259978 Pulled By: janeyx99 fbshipit-source-id: 39c5ffbaa510e1e533d6bdacf5c6158a3dd9885d	2021-11-08 18:28:36 -08:00
Hojin Lee	25cd81876d	Fix typo grid_sampler_3d_cuda (#67752 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67752 Reviewed By: NivekT, mruberry Differential Revision: D32256561 Pulled By: H-Huang fbshipit-source-id: b4d56cadf15bc00181e899ea4be4b1bcfe63f692	2021-11-08 18:16:01 -08:00
Xiang Gao	4b1d044498	[WIP][resubmit] Don't #define NUM_THREADS (#68008 ) Summary: This reverts commit 9e8016d8c48e9c99addad93112f99d3375a0fbc7. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/68008 Reviewed By: H-Huang Differential Revision: D32254779 Pulled By: ngimel fbshipit-source-id: 38ec415199f62a1e58000abe3e34ac91898a94ae	2021-11-08 18:03:45 -08:00
vfdev-5	a2ab06514b	Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189 ) Summary: Description: - Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195 In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing. cc ngimel zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189 Reviewed By: suo Differential Revision: D31966686 Pulled By: ngimel fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4	2021-11-08 17:56:43 -08:00
James Reed	3f048c637f	[distributed] Render `torch.distributed.optim` members (#67885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67885 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32191952 Pulled By: jamesr66a fbshipit-source-id: a9ed52da8e89b3491eab2e691f5571338f83e8e3	2021-11-08 16:20:55 -08:00
Shiyan Deng	fd198a2fea	[fx2trt] fix import in oss tests (#68016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68016 We would want to use oss test utils. Also refactor both test utils so that the internal one is an enhancement over the oss test utils. Test Plan: CI Reviewed By: wushirong Differential Revision: D32250266 fbshipit-source-id: 968b8f215ca2d294f7d0bd13cf9563be567954dd	2021-11-08 16:11:00 -08:00
Shiyan Deng	0d8a8a2e41	[fx2trt]organize converter utils (#68015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68015 Put all converter utils into a single file `converter_utils.py`. Test Plan: CI Reviewed By: wushirong Differential Revision: D32250243 fbshipit-source-id: 93fb34bc9ca23f4c3cef3125e04871083dbd413d	2021-11-08 16:09:42 -08:00
jcwchen	5b036d5f2b	[Doc] [ONNX]Fix a broken url for ONNXRuntime custom op (#67944 ) Summary: Description Update the broken url by a valid link https://onnxruntime.ai/docs/reference/operators/add-custom-op.html. Motivation Closes https://github.com/pytorch/pytorch/issues/67849. The url is broken. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67944 Reviewed By: NivekT Differential Revision: D32252880 Pulled By: H-Huang fbshipit-source-id: 400b0efa3d6f63e60b016c482fbbed1293c29806	2021-11-08 15:51:02 -08:00
Jane Xu	82398e38ab	Upgrade and fix boto3 version to 1.19.12 (#68025 ) Summary: The new boto3 version could be causing the macos test reporting to fail. Pinning to version 1.19.12 example fail: https://app.circleci.com/pipelines/github/pytorch/pytorch/406385/workflows/f15ca6ba-e8af-45a3-b1b0-c0298ea3fe9d/jobs/16687920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68025 Reviewed By: malfet, seemethere Differential Revision: D32261971 Pulled By: janeyx99 fbshipit-source-id: 1a2cd636a2f0b206921749c3f0c9e4707c9a1222	2021-11-08 15:43:35 -08:00
Jane Xu	9094947b0a	use better secrets for upload labels workflow (#68013 ) Summary: Should prevent https://github.com/pytorch/pytorch/runs/4134946329?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68013 Reviewed By: seemethere Differential Revision: D32254046 Pulled By: janeyx99 fbshipit-source-id: 55a7a1b8f8434f6608fe9d423982406c1e187c59	2021-11-08 15:14:28 -08:00
Douglas Lehr	db9b4f1a37	[ROCm] Bump magma source to pickup memory leak fix (#67225 ) Summary: Magma's magma_queue was double allocating storage when creating ptrArray for gemm operations. A fix has been upstreamed and the build needs to pick this up going forward. Fixes #{issue number} cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67225 Reviewed By: janeyx99 Differential Revision: D32252609 Pulled By: seemethere fbshipit-source-id: e27ba1a54dc060fd1bfb4afad9079bf9b4705c8a	2021-11-08 15:08:09 -08:00
Kevin Tse	0b09d62cf3	[hackathon][DataPipe] adding .pyi file generation for torch.utils.data.datapipes (#67374 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ https://github.com/pytorch/pytorch/issues/67374 This is a work in progress. Related TorchData issue: https://github.com/pytorch/data/issues/80 cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/67374 Reviewed By: H-Huang Differential Revision: D32153211 Pulled By: NivekT fbshipit-source-id: b4c61f191f20fd98ca44bb9e4f972c6d812994a0	2021-11-08 14:43:24 -08:00
David Berard	2e523ed229	[JIT] additional support for CallMethod with autocasting (#67925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67925 Previously, the following would always fail, because autocasting would not be enabled in the called method: ``` torch.jit.script def fn(x, y): with autocast(): # CallMethod() to some method fn(x, y) ``` This allows the above, if autocasting is globally enabled, e.g. ``` torch.jit.script def fn(x, y): with autocast(): # CallMethod() to some method with autocast(): fn(x, y) # now ``` ghstack-source-id: 142667351 Test Plan: added test in test_jit_autocast.py Reviewed By: navahgar Differential Revision: D32214439 fbshipit-source-id: bb7db054e25e18f5e3d2fdb449c35b5942ab303e	2021-11-08 14:37:09 -08:00
Gary Miguel	f57c63032e	[ONNX] Fix reciprocal when input is not floating point (#67471 ) (#67808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67808 torch.reciprocal implicitly casts the inputs to float, and ONNX Reciprocal requires floating point inputs. Also separate the reciprocal test from other tests, and test different input types. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181307 Pulled By: malfet fbshipit-source-id: 3e1109b3c85a49c51dc713656a900b4ee78c8340	2021-11-08 14:37:07 -08:00
Gary Miguel	eb22d06e5e	[ONNX] Use human readable enum for dtype scalars (#66822 ) (#67807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67807 Also make quoting of string literals consistent. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181309 Pulled By: malfet fbshipit-source-id: e1053701e3589f0310d8b5ef920359c03c6713f0	2021-11-08 14:37:05 -08:00
Gary Miguel	958d517643	[ONNX] Fix new_full and full_like for Python 3.9 (#67124 ) (#67806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67806 Previously new_full would fail with errors like: `TypeError: only integer tensors of a single element can be converted to an index` And full_like would trigger warnings like: `DeprecationWarning: an integer is required (got type float). Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.` Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181301 Pulled By: malfet fbshipit-source-id: 2cf262cfef36c18e7b2423efe1e1d4fa3438f0ba Co-authored-by: Bowen Bao <bowbao@microsoft.com>	2021-11-08 14:37:03 -08:00
Gary Miguel	37688148ae	[ONNX] Support opset 15 (#67121 ) (#67805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67805 Also fix Reduce ops on binary_cross_entropy_with_logits The graph says the output is a scalar but with `keepdims=1` (the default), the output should be a tensor of rank 1. We set keep `keepdims=0` to make it clear that we want a scalar output. This previously went unnoticed because ONNX Runtime does not strictly enforce shape inference mismatches if the model is not using the latest opset version. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181304 Pulled By: malfet fbshipit-source-id: 1462d8a313daae782013097ebf6341a4d1632e2c Co-authored-by: Bowen Bao <bowbao@microsoft.com>	2021-11-08 14:37:00 -08:00
Bowen Bao	ead59b5ff3	[ONNX] Suppress ort warnings in onnx related test (#67054 ) (#67804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67804 Improve readability of test logs by suppressing ort warnings logging for onnx related test. Reducing ONNX CI test log binary size: linux-xenial-py3.6-clang7-onnx-test1: 12443 KB -> 6958 KB linux-xenial-py3.6-clang7-onnx-test2: 16884 KB -> 5778 KB Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181308 Pulled By: malfet fbshipit-source-id: 11cf165dc212d061606590e96c08c6e021135f74 Co-authored-by: BowenBao<bowbao@microsoft.com>	2021-11-08 14:35:20 -08:00
Samantha Andow	ea60e7d559	Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492 Reviewed By: mruberry Differential Revision: D32007691 Pulled By: samdow fbshipit-source-id: 6cb14dc56e296154e2f48249049c4d2fe4f4d10d	2021-11-08 14:30:50 -08:00
Yinghai Lu	a1d733ae8c	Avoid convert trt.Dims to tuple in hot path (#67960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67960 For some reason, we are throwing py::index_error when converting a trt.Dims to tuple. This staying in the hot path of trt inference in not good, especially when we register a bunch of pybind11 exception translator where they repeatedly rethrow the exception. Since shape is static information, we save it once to avoid such repeated conversion. Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D32232065 fbshipit-source-id: 11e49da9758ead0ff3aa647bbd3fce7735bf4a07	2021-11-08 13:36:15 -08:00
andrewor	4a8f27445d	[Quant] Add dynamic QAT Linear module (#67325 ) Summary: Summary: This commit adds the `torch.nn.qat.dynamic.modules.Linear` module, the dynamic counterpart to `torch.nn.qat.modules.Linear`. Functionally these are very similar, except the dynamic version expects a memoryless observer and is converted into a dynamically quantized module before inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325 Test Plan: `python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear` Reviewers: Charles David Hernandez, Jerry Zhang Subscribers: Charles David Hernandez, Supriya Rao, Yining Lu Tasks: 99696812 Tags: pytorch Reviewed By: malfet, jerryzh168 Differential Revision: D32178739 Pulled By: andrewor14 fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73	2021-11-08 10:24:25 -08:00
Nikita Vedeneev	db456d16ee	`torch.lobpcg.backward`: do not save non-Variable types with `ctx.save_for_backward`. (#67994 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67827 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67994 Reviewed By: H-Huang Differential Revision: D32244818 Pulled By: albanD fbshipit-source-id: 702a3a1d1f4c160bef7ec1f764a2ab5d01ca7901	2021-11-08 10:02:09 -08:00
Michael Suo	8e2528132b	[lint] small updates to .lintrunner.toml (#67942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67942 - Change "name" to "code" for consistency with linttool and LintMessage format. - Change "args" and "init_args" to "command" and "init_command" for consistency with internal representation. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250606 Pulled By: suo fbshipit-source-id: 557fef731bab9adca7ab1e7cc41b996956076b05	2021-11-08 09:45:26 -08:00
Michael Suo	d201102d36	[lint] Add the rest of the grep linters (#67932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67932 Also various improvements to grep_linter.py, including the ability to specify a replacement pattern. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250603 Pulled By: suo fbshipit-source-id: e07eb182e9473a268e2b805a68a859b91228bfbb	2021-11-08 09:45:20 -08:00
Michael Suo	53f118c800	[lint] improve mypy lintrunner config (#67936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67936 - Add the strict config - Make the patterns exactly match the current CI - Add init_args Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250605 Pulled By: suo fbshipit-source-id: a71d434bf6024db4462260a460a1bc2d9ac66a32	2021-11-08 09:45:14 -08:00
Michael Suo	419c58ea9c	[lint] add newlines linter to lintrunner (#67894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67894 As title. Confirmed that the code base passes by running: ``` lintrunner --paths-cmd='git grep -Il ""' --take NEWLINE ``` and seeing that it pases Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250604 Pulled By: suo fbshipit-source-id: de9bcba635d21f8832bb25147b19b7b2e8802247	2021-11-08 09:45:07 -08:00
Michael Suo	4b021280ad	[lint] add nativefunctions to lintrunner (#67890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67890 Adding another linter. I also added a generic initializer that installs the right pip packages (you can invoke it by running `lintrunner init`). Differential Revision: D32197366 D32197366 Test Plan: Imported from OSS Reviewed By: driazati Pulled By: suo fbshipit-source-id: 82844e78f1ee3047220d8444874eab41d7cc0e9e	2021-11-08 09:44:59 -08:00
Michael Suo	5bb5bfccf7	[lint] add lintrunner support for circleci_linter (#67872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67872 As title. This demonstrates some of the nice features of lintrunner: - Uniform error reporting means you get a nice diff of the changes for free - Can run with -a to just accept the changes (don't need to tell people to run a special regenerate command since the linter adaper already knows how.) Differential Revision: D32187386 D32187386 Test Plan: Imported from OSS Reviewed By: driazati Pulled By: suo fbshipit-source-id: 71de6b042730be80ff6794652039e9bc655a72b1	2021-11-08 09:43:25 -08:00
oliver	b3770766c4	Fixes deprecation warnings in `test_optim.py` (#67954 ) Summary: Catches deprecation warnings when we call `scheduler.step(epoch)` in tests. Removes duplicate parameters to optimizers unless we are specifically testing for that Fixes https://github.com/pytorch/pytorch/issues/67696 There is one warning remaining when I run this locally -- however that is due to the implementation of the `SequentialLR` Scheduler. I will open a new issue relating to that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67954 Reviewed By: H-Huang Differential Revision: D32244056 Pulled By: albanD fbshipit-source-id: 2ab3086a58e10c8d29809ccbaab80606a1ec61d8	2021-11-08 09:36:08 -08:00
David Berard	b546cdf401	[SR] Out variant for prim::NumToTensor (#67856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67856 Returns a tensor constructed from scalar input Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Ran ``` buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=NumToTensorScalar --v=1 ``` and the output contains `Switch to out variant for node: %2 : Tensor = prim::NumToTensor(%0)`. Reviewed By: mikeiovine Differential Revision: D32014194 fbshipit-source-id: e7df65ea1bf05d59c1fc99b721aee420e484f542	2021-11-08 09:02:58 -08:00
ZhuRui	0dc99dcf59	Update __init__.py (#67900 ) Summary: fix bugs https://github.com/pytorch/pytorch/issues/67896 fix a syntax error in pytorch/torch/cuda/__init__.py Fixes https://github.com/pytorch/pytorch/issues/67896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67900 Reviewed By: mruberry Differential Revision: D32211978 Pulled By: soulitzer fbshipit-source-id: a313a5e23b4d79e5b7bb909eaf82c9ee6cab10c9	2021-11-08 08:56:38 -08:00
Mike Iovine	5bc89275dd	[SR] Eliminate no-ops (#67437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67437 Certain ops do nothing on the forward pass and can be discarded after training: `aten::detach` and `fb::scale_gradient` are examples of this. Test Plan: `buck test caffe2/test:jit -- test_freezing` Reviewed By: hlu1 Differential Revision: D31980843 fbshipit-source-id: 0045b6babcfae786a2ce801b2f5997a078205bc0	2021-11-08 08:42:33 -08:00
thomasw21	191b48b12f	[torch.fx] Fix replace pattern mechanism (#66442 ) Summary: Fixes #{issue number} The following code would not return the pattern correctly: ```python def f(x): x = torch.sigmoid(x) x = torch.sigmoid(x) return torch.sigmoid(x) def pattern(x): return torch.sigmoid(x) def replacement(x): return torch.exp(x) def comparison(x): x = torch.exp(x) x = torch.exp(x) return torch.exp(x) traced = symbolic_trace(f) comparison_fn = symbolic_trace(comparison) subgraph_rewriter.replace_pattern(traced, pattern, replacement) # Only one sigmoid gets converted. ``` This PR fixes this by adding a new test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66442 Reviewed By: ZolotukhinM Differential Revision: D32238424 Pulled By: ansley fbshipit-source-id: 386e777174c639baafc166d5ffbc0658a96b1ee9	2021-11-07 13:23:02 -08:00
Howard Huang	9fb3ba9d7b	Revert D31762735 (#67924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924 This diff reverts the changes made in D31762735 (`0cbfd466d2`) Test Plan: Wait for CI Reviewed By: derekmod-fb Differential Revision: D32214744 fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374	2021-11-06 17:34:13 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9cacf2b718	Add custom zipper script to zip python modules for torch.deploy (#67006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67006 Test Plan: nervouslaugh_ Reviewed By: shunting314 Differential Revision: D31822429 fbshipit-source-id: c2efeab1446fbeb70b98d4ee766fbc670cf091b0	2021-11-06 11:49:02 -07:00
Chen Lai	ae501a9727	[PyTorch Edge] Update bytecode version compatibility check (#67417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417 bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported ghstack-source-id: 142609392 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' ``` Reviewed By: JacobSzwejbka, iseeyuan Differential Revision: D31984839 fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877	2021-11-05 19:34:01 -07:00
James Reed	80178d6152	[DDP] Fix some issues with code example in DDP docstring (#67883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67883 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D32190946 Pulled By: jamesr66a fbshipit-source-id: a376324b95cbe833ffa606ecdfc6156432880f70	2021-11-05 17:32:45 -07:00
James Reed	22afe82ce3	[rpc] Switch RPC agent check to TORCH_CHECK and add more descriptive error (#67882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67882 I ran into a hard-to-interpret error message when trying to run the following script, which was missing an `init_rpc` call: ``` # $ torchrun --standalone --nnodes=1 --nproc_per_node=1 script.py import os rank = int(os.environ['LOCAL_RANK']) world_size = int(os.environ['WORLD_SIZE']) import torch.distributed # !!!!!! Uncomment the following and the script succeeds # torch.distributed.rpc.init_rpc('worker', rank=rank, world_size=world_size) import torch.distributed as dist dist.init_process_group(backend='gloo') import torchvision.models as models import torch rn50 = models.resnet50() rn50.train() rn50 = torch.nn.parallel.DistributedDataParallel(rn50) from torch.distributed.rpc import RRef from torch.distributed.optim import DistributedOptimizer params = [] for param in rn50.parameters(): params.append(RRef(param)) dist_optim = DistributedOptimizer( torch.optim.SGD, params, lr=0.05) loss_func = torch.nn.CrossEntropyLoss() with torch.distributed.autograd.context() as context_id: pred = rn50(torch.randn(50, 3, 224, 224)) target = torch.randn(50, 1000).softmax(dim=1) loss = loss_func(pred, target) dist.autograd.backward(context_id, [loss]) dist_optim.step(context_id) ``` Error: ``` Traceback (most recent call last): File "/xxx/torchrun_exp/script.py", line 23, in <module> params.append(RRef(param)) RuntimeError: agentINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rpc_agent.cpp":237, please report a bug to PyTorch. Current RPC agent is not set! ``` Since this is a user-facing error, I've changed `TORCH_INTERNAL_ASSERT` to `TORCH_CHECK` and added a hint about how to resolve the issue. On the other hand, the fact that this was originally `TORCH_INTERNAL_ASSERT` may suggest that the author thought that this should be an internal-only error condition. If there is some other place that should be throwing an exception in this case that is failing, let me know and I can adapt the fix to change that location. Question for reviewers: * Is there a good test file where I can add a test for this error condition? cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32190947 Pulled By: jamesr66a fbshipit-source-id: 3621d755329fd524db68675c55b1daf20e716d43	2021-11-05 17:31:11 -07:00
Can Balioglu	efdb17b984	Add meta support to tensor range factories (#67032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032 This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators. Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR. Original commit changeset: 2f9d8d1acbb0 ghstack-source-id: 142487306 Test Plan: Extended the existing tensor creation tests to assert meta backend support. Reviewed By: zhaojuanmao Differential Revision: D31834403 fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34	2021-11-05 15:36:29 -07:00
Jane Xu	9e8016d8c4	Revert D31932215: [pytorch][PR] Don't #define NUM_THREADS Test Plan: revert-hammer Differential Revision: D31932215 (`f70e8064f4`) Original commit changeset: ccdf11e249fb fbshipit-source-id: 4c330aebe9cfb483f02ceb1fdaf5c3b0f8fa6fa1	2021-11-05 15:14:32 -07:00
Jerry Zhang	10411e3561	[quan][fusion] Fix a additional_fuser_method method for fuse_fx (#67876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67876 Previously we miss it when we call obj.convert and this argument would not impact the fusion. This PR fixes it and adds a test for it Test Plan: python test/test_quantization.py TestFuseFx Imported from OSS Reviewed By: malfet Differential Revision: D32191364 fbshipit-source-id: 566bd39461010d70a21de71f611bb929976fe01d	2021-11-05 14:51:15 -07:00
Xiang Gao	f70e8064f4	Don't #define NUM_THREADS (#67258 ) Summary: PyTorch doesn't compile with the latest `main` branch of cub again. The root cause is, PyTorch defines a macro `NUM_THREADS`, and cub added some code like ```C++ template<...., int NUM_THREADS, ...> ``` and these two mess up with each other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67258 Reviewed By: albanD Differential Revision: D31932215 Pulled By: ngimel fbshipit-source-id: ccdf11e249fbc0b6f654535067a0294037ee7b96	2021-11-05 13:56:11 -07:00
Andrey Talman	b1ecfc6d45	Add timeouts for GHA jobs for pytorch/pytorch (#67912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67912 Reviewed By: seemethere Differential Revision: D32215323 Pulled By: atalman fbshipit-source-id: 45da7c4bb13c877c9b38bea8615adf75c4a9702d	2021-11-05 12:50:19 -07:00
Kiuk Chung	f6402c469e	(torch/elastic) fix scale down bug caused by calling rdzv_handler.shutdown() on premature agent failures (#67749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67749 Fixes: https://github.com/pytorch/pytorch/issues/67742 Test Plan: Added unittests. Validated manually: ``` # start agent 0 $ torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py # start agent 1 torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py # kill agent 0 CTRL+C (SIGINT) or kill -15 (SIGTERM) # restart it torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py ``` Reviewed By: cbalioglu Differential Revision: D32129005 fbshipit-source-id: db292268250ef6f1e06f5b4c5bd67124d8dfd325	2021-11-05 12:18:46 -07:00
Samantha Andow	240e8d5cc5	Updated searchsorted functionality (#66818 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60492 Updates searchsorted API to be more consistent with numpy and adds an OpInfo for searchsorted Pull Request resolved: https://github.com/pytorch/pytorch/pull/66818 Reviewed By: mruberry Differential Revision: D31745142 Pulled By: samdow fbshipit-source-id: 0b9600afc3cb0720afb5811212404ee96d2a7d93	2021-11-05 12:13:47 -07:00
Xiao Wang	f6a4c80a5a	Refactor cuDNN Convolution memory format and Conv-Bias-Relu code (#65594 ) Summary: This PR makes several changes: - Changed function `bool cudnn_conv_use_channels_last(...)` to `at::MemoryFormat cudnn_conv_suggest_memory_format(...)` - Removed `resize_` in cudnn convolution code. Added a new overloading method `TensorDescriptor::set` that also passes the desired memory format of the tensor. - Disabled the usage of double + channels_last on cuDNN Conv-Relu and Conv-Bias-Relu. Call `.contiguous(memory_format)` before passing data to cuDNN functions. - Disabled the usage of cuDNN fused Conv-Bias-Relu in cuDNN < 8.0 version due to a CUDNN_STATUS_NOT_SUPPORTED error. Instead, use the native fallback path. - Let Conv-Bias-Relu code respect the global `allow_tf32` flag. From cuDNN document, double + NHWC is genenrally not supported. Close https://github.com/pytorch/pytorch/pull/66968 Fix https://github.com/pytorch/pytorch/issues/55301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65594 Reviewed By: jbschlosser, malfet Differential Revision: D32175766 Pulled By: ngimel fbshipit-source-id: 7ba079c9f7c46fc56f8bfef05bad0854acf380d7	2021-11-05 11:50:55 -07:00
Masaki Kozuki	cdd5d16489	[Foreach] Implement L1&L2 norm (#62646 ) Summary: Implement L1 & L2 norm in fast path with the reference of [nvidia/apex](https://github.com/NVIDIA/apex/blob/master/csrc/multi_tensor_l2norm_kernel.cu). When `ord` is neither 1 nor 2, then slow path is chosen. Related: https://github.com/pytorch/pytorch/issues/58833 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/62646 Reviewed By: malfet Differential Revision: D32173421 Pulled By: ngimel fbshipit-source-id: 14b7544601658a979b83509df351e1848ded7675	2021-11-05 11:23:00 -07:00
Raghavan Raman	e7a3bbce89	[nnc] Add support for dynamic shapes in TensorExprKernel (#67861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861 Previously submitted as https://github.com/pytorch/pytorch/pull/67197. This got reverted because its failures were hidden by the failures of another PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32178196 Pulled By: navahgar fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300	2021-11-05 11:18:19 -07:00
Jane Xu	a4a6d056e6	Add ownership to more edge tests (#67859 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66232 This should be the last immediate task. I anticipate test ownership will change overtime but this is the last big thing to close it out Pull Request resolved: https://github.com/pytorch/pytorch/pull/67859 Reviewed By: soulitzer Differential Revision: D32210534 Pulled By: janeyx99 fbshipit-source-id: 7fd835d87d9d35d49ec49de1fcfa29b085133e99	2021-11-05 11:01:16 -07:00
Natalia Gimelshein	9dafb6434b	remove use of THGenerateAllTypes, clean up (#67867 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67867 Reviewed By: mruberry Differential Revision: D32191053 Pulled By: ngimel fbshipit-source-id: 84eb6c2989495fca5f7b055c4984efe5de94e812	2021-11-05 10:57:04 -07:00
jiej	ee7412dd29	autodiff fix for autocast_to_xxx (#67648 ) Summary: Fixes autocast + autodiff issue where `RuntimeError: grad_inputs.size() == node->inputs().size()INTERNAL ASSERT FAILED at "../torch/csrc/jit/runtime/autodiff.cpp":426, please report a bug to PyTorch.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67648 Reviewed By: cpuhrsch Differential Revision: D32083227 Pulled By: davidberard98 fbshipit-source-id: edf526cff4ec21874ae35ec730d13c250073e10c	2021-11-05 10:48:39 -07:00
Pavithran Ramachandran	9269080b47	[PyTorchEge] backport test (#67824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67824 Testing backport of all prod models using model test framework Ref: [Create tests at run-time (google test)](https://stackoverflow.com/questions/19160244/create-tests-at-run-time-google-test) breaking the list of models into 20 chunks based on a simple hash (sum of all char values) ghstack-source-id: 142398833 Test Plan: ``` buck test //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs Starting new Buck daemon... Parsing buck files: finished in 7.6 sec Creating action graph: finished in 0.9 sec [RE] Metadata: Session ID=[reSessionID-66f5adfe-50d1-4599-9828-3e8115181601] [RE] Waiting on 0 remote actions. Completed 1008 actions remotely, action cache hit rate: 43.59%. Downloaded 26/1523 artifacts, 252.60 Kbytes, 96.6% cache miss (for updated rules) Building: finished in 01:18.6 min (100%) 5532/5532 jobs, 770/5532 updated Total time: 01:27.3 min Testing: finished in 11:21.6 min (41 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs PASS 673.8s 41 Passed 0 Skipped 0 Failed //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs TESTS PASSED ``` Reviewed By: dhruvbird Differential Revision: D32068955 fbshipit-source-id: d06c2434a4a69572ab52df31a684e5973b9d551c	2021-11-05 10:41:36 -07:00
Bowen Bao	02e35ce17b	[ONNX] Update onnx function export with comments and clean up (#66817 ) (#67803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803 * Addresses comments from #63589 [ONNX] remove torch::onnx::PRODUCER_VERSION (#67107) Use constants from version.h instead. This simplifies things since we no longer have to update PRODUCER_VERSION for each release. Also add TORCH_VERSION to version.h so that a string is available for this purpose. [ONNX] Set `ir_version` based on opset_version. (#67128) This increases the odds that the exported ONNX model will be usable. Before this change, we were setting the IR version to a value which may be higher than what the model consumer supports. Also some minor clean-up in the test code: * Fix string replacement. * Use a temporary file so as to not leave files around in the test current working directory. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181306 Pulled By: malfet fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663 Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: Gary Miguel <garymiguel@microsoft.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-05 10:35:35 -07:00
Rohan Varma	ace2183195	[FSDP] Address follow up comments for CPU offload (#67813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67813 Address Shen's comments in https://github.com/pytorch/pytorch/pull/67249/files ghstack-source-id: 142379312 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32157545 fbshipit-source-id: 3cc2df6d5fa0d3b9383ed3711e7f79729dbb1dda	2021-11-05 10:34:08 -07:00
soulitzer	823ae3a4ff	[forward ad] Also check layout of grad matches that of self for inplace over view (#67816 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67800 Currently when the grad is the same layout as base, we try to assign the same tensor to the forward grad of both the base and the view. However, when the layout of the grad is different from the layout of the view, this triggers a copy to be created, and the tangent of the view (after the inplace) will not have a view relationship with the view of the base. This PR just changes it so that we only do the above optimization when the layout also matches the layout of self Pull Request resolved: https://github.com/pytorch/pytorch/pull/67816 Reviewed By: malfet Differential Revision: D32190021 Pulled By: soulitzer fbshipit-source-id: b1b2c9b332e83f4df5695ee9686ea76447f9305b	2021-11-05 10:26:24 -07:00
Howard Huang	13a69d23b1	Add retry logic for test_multitenancy and documentation for find_free_port (#67775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67775 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32142749 Pulled By: H-Huang fbshipit-source-id: 67ab4ede4f4bff96a1ffd41d55b3be0edc82b1ce	2021-11-05 09:05:12 -07:00
Thomas Viehmann	33b7790907	Fix conv_transpose3d backward with non-contiguous grad_out (#67829 ) Summary: Many thanks to Forest Yang (meowmix) from the forum for reporting it with a minimal reproduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67829 Reviewed By: malfet Differential Revision: D32184786 Pulled By: albanD fbshipit-source-id: b63dbd3148b5def2109deb2f4612c08f55f59dfb	2021-11-05 08:34:21 -07:00
hesom	07a08fb95f	Fix typo in LinearLR docs (#67840 ) Summary: The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840 Reviewed By: jbschlosser Differential Revision: D32187091 Pulled By: albanD fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066	2021-11-05 07:16:15 -07:00
oliver	53ebccbe78	Fix warnings produced when running test_optim.py (#67756 ) Summary: Fixes part of https://github.com/pytorch/pytorch/issues/67696 by adding calls to `optimizer.step()` in various places. ## Notes for reviewers: - It is not entirely clear which is the right optimizer to step in each case. I have favoured the more explicit approach of creating a set of optimizers and calling step on each of them. - At the time of writing, the only Scheduler without an `optimizer` instance variable is `ChainedScheduler` which I need to deal with once. I use `hasattr` to do this check. Let me know if this ought to be changed. - I am opening this PR for review when it only solve part of the issue, as I'd rather get feedback sooner. I think it is fine to fix the issue in several PRs too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67756 Reviewed By: jbschlosser Differential Revision: D32187864 Pulled By: albanD fbshipit-source-id: fd0d133bcaa3a24588e5a997ad198fdf5879ff5a	2021-11-05 07:12:13 -07:00
Joel Schlosser	b098264f22	Revert D32063662: [pytorch][PR] TST Adds device transfer into module info tests Test Plan: revert-hammer Differential Revision: D32063662 (`da59bd1d13`) Original commit changeset: 0868235a0ae7 fbshipit-source-id: a4f775874faa88be0eb5272dedf3bbc8194ebde6	2021-11-05 07:07:39 -07:00
Alban Desmaison	bb8978f605	Revert D32175963: Converting hardswish to strucutred kernels with metatensor support Test Plan: revert-hammer Differential Revision: D32175963 (`57335a9ee3`) Original commit changeset: f4d749c6aeaf fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a	2021-11-05 07:04:40 -07:00
Alban Desmaison	4d5338228f	Revert D32175960: Moving parts of the Shape Registry into a common file Test Plan: revert-hammer Differential Revision: D32175960 (`d04389e6f0`) Original commit changeset: 2e30115ca554 fbshipit-source-id: 27f9889c535e4f7c21c50b2468e1e6650e952d4f	2021-11-05 07:04:37 -07:00
Alban Desmaison	38af37f409	Revert D32175958: Adding Custom Rules to Device Propagation Test Plan: revert-hammer Differential Revision: D32175958 (`853298481b`) Original commit changeset: 26a9ef41e10a fbshipit-source-id: adcc70687b5b454f358b5446bed2c06d04e61435	2021-11-05 07:04:35 -07:00
Alban Desmaison	b1ac7f51a1	Revert D32175957: Adding custom testing based on opinfos input for ops with custom rules. Test Plan: revert-hammer Differential Revision: D32175957 (`b8e165e841`) Original commit changeset: 1cb51a7b6cbb fbshipit-source-id: 29fd0750d9981758436c55eea2de40cdaddfb9be	2021-11-05 07:04:33 -07:00
Alban Desmaison	0c8569bec9	Revert D32175959: Merging the implementations of ClearProfiling Test Plan: revert-hammer Differential Revision: D32175959 (`f1754319e3`) Original commit changeset: b335dacce709 fbshipit-source-id: 23d1f75d47f15effc9806bd6e5228007d521b0b3	2021-11-05 07:03:18 -07:00
Don Jang	2f68878a05	[Static Runtime] Add a comment on clients taking ownership of managed output tensors (#67554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67554 This change adds a comment on clients taking ownership of managed output tensor to remind SR developers of how and why that matters. Test Plan: N/A Reviewed By: swolchok Differential Revision: D32013468 fbshipit-source-id: bcc13055c329c61677bdcc76411fe8db44bb2cee	2021-11-04 22:20:49 -07:00
Shirong Wu	ba9d9d488e	Implement padding with slice layer (#67888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67888 Implement padding with slice layer, work step is: reverse slice and pad 0 [1, 2] => [2, 1, 0 ... 0] transpose, reverse tensor back to original order, finish pre-pad [2, 1, 0 ... 0] => [0 ... 0, 1, 2] continue post-pad [0 ... 0, 1, 2] => [0 ... 0, 1, 2, 0 ... 0] Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad Reviewed By: 842974287 Differential Revision: D32160739 fbshipit-source-id: dbbc04d916e23551e3ce9be480283377e9a38b34	2021-11-04 21:25:01 -07:00
Shunting Zhang	daaad47d9c	Allow torch::deploy unity embed xar file of any size (#67814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67814 There was a limitation on the xar file size we can embed into the binary previously. The payload (xar file here) is added to .data section by default using 'ld -b binary -r' command (which section the payload goes is hardcoded in ld BTW. Check code pointer [here](https://github.com/bminor/binutils-gdb/blob/binutils-2_32/bfd/binary.c#L80) ) . When we link the object file containing the payload to other parts of the executable, we will get relocation out of range error if the overall size of .test, .data, .bss etc sections exceed 2G. Some relocation entries uses 32 bit singed integer, thus the limit is 2G here. To solve the issue and mitigate the risk, we designed a mechanism to put the payload in a customized payload section (.torch_deploy_payload.unity here). The payload section does not join the party of relocating and symbol resolution, thus in theory it can be as large as the disk space... Since we don't do relocation for the payload section, the start/end/size symbols are no longer available/valid, we have to parse the ELF file ourselves to figure out those. The mechanism can be used to embed interprter.so as well. The interpreter.so is currently 0.5G. That will limit the other .test/.data/.bss sections of the executable to be at most 1.5G. Using this mechanim in this diff avoid the interpreter.so taking any budgets. We could also use this mechanism to ship python scripts with our binary rather than freeze them before hand. These use cases are not handled in this diff. This diff also improves experience for those simple use cases that does not depends on extra shared libraries in the XAR file (except the shared libraries for python extensions themselves). This is mainly for fixing the stress test right now, but it also makes other simple cases easier. ghstack-source-id: 142483327 Test Plan: # Verify the relocation out of range issue is fixed Add //caffe2:torch as a dependency to the macro build_unity(name="example", …) in torch/csrc/deploy/unity/TARGETS and run 'buck run mode/opt :unity_demo', it's expected to get the relocation errors like: ``` ld.lld: error: caffe2/c10/util/intrusive_ptr.h:325:(.text._ZN11ska_ordered8detailv317sherwood_v3_tableISt4pairIN3c106IValueES4_ES4_NS3_6detail11DictKeyHashENS0_16KeyOrValueHasherIS4_S5_S7_EENS6_14DictKeyEqualToENS0_18KeyOrValueEqualityIS4_S5_SA_EESaIS5_ESaINS0_17sherwood_v3_entryIS5_EEEE15emplace_new_keyIS5_JEEES2_INSH_18templated_iteratorIS5_EEbEaPSF_OT_DpOT0_+0x4E9): relocation R_X86_64_32S out of range: 2345984168 is not in [-2147483648, 2147483647]; references c10::UndefinedTensorImpl::_singleton >>> defined in /data/sandcastle/boxes/fbsource/fbcode/buck-out/opt/gen/caffe2/c10/c10#platform009-clang,static/libc10.a(../c10#compile-UndefinedTensorImpl.cpp.o44c44c4c,platform009-clang/core/UndefinedTensorImpl.cpp.o) ``` With the diff, the error above is resolved. # Pass Stress Test Also pass existing unit tests for unity. buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_sum -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_sum - UnityTest.TestUnitySum' --run-disabled --jobs 18 --stress-runs 10 --record-results buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model - UnityTest.TestUnitySimpleModel' --run-disabled --jobs 18 --stress-runs 10 --record-results # Verify debug sections are not messed up Verified that debug sections are not messed up and GDB still works: `gdb ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity_demo` ``` b main run l c ``` Reviewed By: suo Differential Revision: D32159644 fbshipit-source-id: a133513261b73551a71acc257f4019f7b5af34a8	2021-11-04 20:52:57 -07:00
Digant Desai	5a48868d39	[qnnpack] fix benchmarks after an API update (#67768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67768 We don't need to pass so many padding args after removing support for asymm padding from qnnpack Test Plan: it builds Reviewed By: jshen Differential Revision: D32082204 fbshipit-source-id: 2bfe4c135ad613f0cc267e7e3ab6357731f29bc2	2021-11-04 20:17:05 -07:00
John Clow	f1754319e3	Merging the implementations of ClearProfiling (#67575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175959 Pulled By: Gamrix fbshipit-source-id: b335dacce709a64e3d5779f9c6e9569f86e10748	2021-11-04 19:02:08 -07:00
John Clow	b8e165e841	Adding custom testing based on opinfos input for ops with custom rules. (#67500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175957 Pulled By: Gamrix fbshipit-source-id: 1cb51a7b6cbb75bf3841e3c4caedf88aa94168fe	2021-11-04 19:02:06 -07:00
John Clow	853298481b	Adding Custom Rules to Device Propagation (#66973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D32175958 Pulled By: Gamrix fbshipit-source-id: 26a9ef41e10a171be6a8779a4e6014e2e7e3f2c1	2021-11-04 19:02:04 -07:00
John Clow	d04389e6f0	Moving parts of the Shape Registry into a common file (#66948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175960 Pulled By: Gamrix fbshipit-source-id: 2e30115ca554816166fedddbcdeffbe189eb19a6	2021-11-04 19:02:02 -07:00
John Clow	57335a9ee3	Converting hardswish to strucutred kernels with metatensor support (#66899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175963 Pulled By: Gamrix fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d	2021-11-04 19:02:00 -07:00
John Clow	ec8a71f9ac	Dtype Analysis for Unary and Binary ops with Metatensors (#66898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175961 Pulled By: Gamrix fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734	2021-11-04 19:00:50 -07:00
Bert Maher	4b084bc832	Benchmarks for various fusers (#67622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67622 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32171063 Pulled By: bertmaher fbshipit-source-id: 40d3a7adcc52aba3b051e382ec5ec4ee7e43d81b	2021-11-04 18:57:17 -07:00
Shirong Wu	31fc9d6539	Introduce version control for tensorrt converter decorator (#67886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67886 Similar to what we have in torch2trt tensorrt_converter, introduce version enablement for fx2trt converters. Upgrade to trt 8.2 will introduce new op converter as well as deprecate old op. Test Plan: pass existing unit test Reviewed By: 842974287 Differential Revision: D32183581 fbshipit-source-id: 6419acada296d24e882efa9fca25eca6349153e4	2021-11-04 17:39:15 -07:00
Tao Xu	f5daa9f76b	[iOS] Enable ARC for CMake build (#67884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67884 Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D32191532 Pulled By: xta0 fbshipit-source-id: a295004f8e7f1b0f5a4ab12ffd9b37c36b80226b	2021-11-04 16:50:46 -07:00
Pavithran Ramachandran	c2ceba8ada	[PyTorchEdge] Move all serialize/deserialize files to a separate target (#66805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66805 {F672465642} DGW: ``` buck query 'allpaths(//xplat/caffe2:torch_mobile_core, //xplat/caffe2:torch_mobile_interpreter)' --output-format dot_compact \| pastry bunnylol dgw paste_id ``` Test Plan: buck builds to pass ``` buck build fbsource//fbandroid/mode/opt @//fbandroid/mode/messenger //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest_redex_optimizedtestableresources_postprocessed_resign //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest#unstripped_native_libraries buck build //xplat/caffe2:torch_mobile_coreAndroid#android-armv7,shared buck build //xplat/caffe2:torch_commonAndroid#android-armv7,shared ``` DGW: ``` buck query 'allpaths(//xplat/caffe2/fb/runtime:only_flatbuffer_test, //xplat/caffe2:miniz)' --output-format dot_compact \| pastry P464671429: https://www.internalfb.com/intern/paste/P464671429/ bunnylol dgw P464671429 ``` loader is decoupled from miniz ``` buck query 'allpaths(//xplat/caffe2/fb/runtime:flatbuffer_loader, //xplat/caffe2:miniz)' --output-format dot_compactdigraph result_graph { } ``` Reviewed By: iseeyuan Differential Revision: D31532862 fbshipit-source-id: 51e6880e78e1cafe20c8d90e98037edc3c1b6b11	2021-11-04 15:55:52 -07:00
Scott Wolchok	b0c05297f9	[Static Runtime] Arena allocate StorageImpls for managed tensors (#66130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66130 We're reusing backing storage for these tensors, which is only safe because they have non-overlapping lifetimes. Accordingly, it seems that they can also share their StorageImpl. ghstack-source-id: 142427752 Test Plan: benchmarked ctr_mobile_feed local and local_ro: Using recordio inputs for model 302008423_0 ``` swolchok@devbig032 ~/f/fbcode> env MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 > environment^C swolchok@devbig032 ~/f/fbcode> sudo ~/fbsource2/fbcode/scripts/bertrand/noise/denoise-env.sh \ /tmp/ptvsc2_predictor_benchNov1ArenaAllocateStorageImpls \ --scripted_model=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.predictor.disagg.local \ --method_name=local.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=2 --warmup_iters=2 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 --repetitions=5 --recordio_use_ivalue_format=1 --recordio_inputs=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.local.inputs.recordio Stable ======================================== I1101 14:19:16.473964 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0131. Iters per second: 49.9673 I1101 14:20:12.193130 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0155. Iters per second: 49.9612 I1101 14:21:07.761898 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9751. Iters per second: 50.0624 I1101 14:22:03.218066 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9104. Iters per second: 50.2249 I1101 14:22:58.723256 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.956. Iters per second: 50.1102 I1101 14:22:58.723306 2748837 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.974, standard deviation: 0.043643 ArenaAllocateStorageImpls ======================================== I1101 14:08:57.070914 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9771. Iters per second: 50.0572 I1101 14:09:52.605121 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.924. Iters per second: 50.1907 I1101 14:10:48.098287 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9353. Iters per second: 50.1624 I1101 14:11:43.645395 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9723. Iters per second: 50.0694 I1101 14:12:39.171636 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9673. Iters per second: 50.0819 I1101 14:12:39.171685 2695478 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.9552, standard deviation: 0.0239318 difference: 0.0188 (0.09%), which is less than 1 standard deviation Stable, local_ro ======================================== I1101 14:26:10.796161 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25991. Iters per second: 793.708 I1101 14:26:12.194727 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26862. Iters per second: 788.26 I1101 14:26:13.591312 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26549. Iters per second: 790.207 I1101 14:26:14.982439 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25943. Iters per second: 794.01 I1101 14:26:16.377033 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25995. Iters per second: 793.68 I1101 14:26:16.377094 2787930 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.26268, standard deviation: 0.00414788 ArenaAllocateStorageImpls, local_ro ======================================== I1101 14:26:45.875073 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20987. Iters per second: 826.536 I1101 14:26:47.207271 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20827. Iters per second: 827.633 I1101 14:26:48.533766 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20023. Iters per second: 833.174 I1101 14:26:49.850610 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19206. Iters per second: 838.884 I1101 14:26:51.172356 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19958. Iters per second: 833.622 I1101 14:26:51.172411 2790009 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.202, standard deviation: 0.00722754 Difference: 0.06 usec/iter (4.8%), which is much more than 1 standard deviation ``` we can see that this is a large relative improvement on local_ro, but no effect on local. Reviewed By: hlu1 Differential Revision: D31357486 fbshipit-source-id: 229c003677da76e89c659d0e0639002accced76e	2021-11-04 15:43:39 -07:00
Scott Wolchok	01809731bc	[Static Runtime] Cache managed tensor Storages (#66638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66638 See comments in code explaining what we're doing here. ghstack-source-id: 142427750 Test Plan: Ran ptvsc2_predictor_bench on ctr_mobile_feed local and local_ro net before/after this change on a devserver with turbo off. Results: ``` stable, local_ro: ======================================== I1014 16:13:52.713300 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68012. Iters per second: 373.118 I1014 16:14:00.961875 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66156. Iters per second: 375.719 I1014 16:14:09.163097 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.6449. Iters per second: 378.086 I1014 16:14:17.425621 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66661. Iters per second: 375.008 I1014 16:14:25.711349 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67375. Iters per second: 374.006 I1014 16:14:25.711390 151733 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.66539, standard deviation: 0.0134423 stable, local: ======================================== I1014 15:08:28.547081 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42772. Iters per second: 155.576 I1014 15:08:48.276582 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3643. Iters per second: 157.127 I1014 15:09:07.978683 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3566. Iters per second: 157.317 I1014 15:09:27.875543 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42044. Iters per second: 155.752 I1014 15:09:47.558079 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.34902. Iters per second: 157.505 I1014 15:09:47.558120 3979345 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.38361, standard deviation: 0.037421 cache storages, local_ro: ======================================== I1014 16:15:42.292997 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66604. Iters per second: 375.088 I1014 16:15:50.622402 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68683. Iters per second: 372.186 I1014 16:15:58.901475 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67028. Iters per second: 374.493 I1014 16:16:07.156373 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66317. Iters per second: 375.492 I1014 16:16:15.474292 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68394. Iters per second: 372.587 I1014 16:16:15.474334 160496 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.67405, standard deviation: 0.0106982 cache storages, local: ======================================== I1014 20:53:43.113400 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3811. Iters per second: 156.713 I1014 20:54:02.829102 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.36039. Iters per second: 157.223 I1014 20:54:22.885171 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.47333. Iters per second: 154.48 I1014 20:54:42.768963 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.41404. Iters per second: 155.908 I1014 20:55:02.624423 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.4042. Iters per second: 156.147 I1014 20:55:02.624460 1657168 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.40661, standard deviation: 0.0427168 ``` Looks like this diff is neutral or a slight regression, but it is a stepping stone on the way to the following diff. Reviewed By: hlu1 Differential Revision: D31326711 fbshipit-source-id: a6e0185f24a6264b1af2a51b69243c310d0d48d5	2021-11-04 15:42:22 -07:00
Nikita Shulga	56dda833ff	Small updates to RELEASE.md (#65489 ) Summary: Combine `xla` and `builder` branch pinning steps and link them to a PR that does it correctly Update example PR for version bump, as few files have changed Deleted FaceHub step as it is no longer necessary after recent update Pull Request resolved: https://github.com/pytorch/pytorch/pull/65489 Reviewed By: zhouzhuojie, seemethere Differential Revision: D31120498 Pulled By: malfet fbshipit-source-id: e1a9db2b03243c8d28eeed9888c3653e4460ad07	2021-11-04 15:39:40 -07:00
Ivan Yashchuk	d5d342b237	Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401 This PR fixes the case when result and input tensors have different strides. cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to write the result. This is "fixed" in PyTorch code by copying the input tensor to a tensor with same strides as result tensor has. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D32177966 Pulled By: cpuhrsch fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63	2021-11-04 15:34:42 -07:00
Caspar van Leeuwen	a20a64af4e	Increased tolerance for test_zero_model_parallel tests (#67765 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67764 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67765 Reviewed By: malfet Differential Revision: D32171621 Pulled By: mrshenli fbshipit-source-id: 8c34f4714289cb41824f3a18822a28ed670fa0a6	2021-11-04 15:17:45 -07:00
MalikIdreesHasanKhan	c541c69e89	Fix minor typo in contributing.md (#67855 ) Summary: Fixes #{issue number} No issue number, minor change Pull Request resolved: https://github.com/pytorch/pytorch/pull/67855 Reviewed By: malfet Differential Revision: D32186689 Pulled By: driazati fbshipit-source-id: 7cda19f66ff1312296d8310922bb0d221df81e46	2021-11-04 14:38:48 -07:00
Jiewen Tan	8bed46ef38	[WIP][LTC] Upstream class Shape (#67672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672 This commit Upstreams class Shape from lazy_tensor_staging branch. Test Plan: WIP. Reviewed By: malfet Differential Revision: D32095478 Pulled By: alanwaketan fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9	2021-11-04 14:12:03 -07:00
Shashank Chaudhry	e8ac8c005d	[NOOP][clangformat][codemod] Enable CLANGFORMAT (#67854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67854 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D32173077 fbshipit-source-id: 10ab4b0afa18c7be4fab3e3564d9b479a7a48cb5	2021-11-04 14:07:57 -07:00
Hao Lu	938bab0bfd	[PyTorch] Add int version of vectorized PrefixSum to Benchmark (#67865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67865 - Add int version of vectorized PrefixSum - Use unaligned load/store instructions - Add exclusive scan version. "exclusive" means that the i-th input element is not included in the i-th sum. For details see https://en.cppreference.com/w/cpp/algorithm/exclusive_scan Test Plan: ``` buck build mode/opt-clang //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench OMP_NUM_THREADS=1 numactl -m 0 -C 5 \ ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench ``` For full benchmark results, see P465274613 ``` PrefixSumBench/LocalInt/64 57 ns 56 ns 12414048 GB/s=9.06239G/s PrefixSumBench/LocalInt/256 221 ns 221 ns 3160853 GB/s=9.28635G/s PrefixSumBench/LocalInt/1024 818 ns 817 ns 857922 GB/s=10.0235G/s PrefixSumBench/LocalInt/4096 3211 ns 3210 ns 217614 GB/s=10.2093G/s PrefixSumBench/LocalInt/16384 12806 ns 12804 ns 54805 GB/s=10.2364G/s PrefixSumBench/LocalInt/65536 51115 ns 51079 ns 13741 GB/s=10.2643G/s PrefixSumBench/LocalInt/262144 205974 ns 205912 ns 3401 GB/s=10.1847G/s PrefixSumBench/LocalInt/1048576 829523 ns 828859 ns 845 GB/s=10.1207G/s PrefixSumBench/LocalIntAVX2/64 45 ns 45 ns 15568113 GB/s=11.3549G/s PrefixSumBench/LocalIntAVX2/256 208 ns 208 ns 3371174 GB/s=9.86913G/s PrefixSumBench/LocalIntAVX2/1024 893 ns 892 ns 783154 GB/s=9.18629G/s PrefixSumBench/LocalIntAVX2/4096 3618 ns 3613 ns 193834 GB/s=9.06838G/s PrefixSumBench/LocalIntAVX2/16384 14416 ns 14411 ns 48564 GB/s=9.09543G/s PrefixSumBench/LocalIntAVX2/65536 57650 ns 57617 ns 12156 GB/s=9.09952G/s PrefixSumBench/LocalIntAVX2/262144 230855 ns 230612 ns 3035 GB/s=9.09386G/s PrefixSumBench/LocalIntAVX2/1048576 924265 ns 923777 ns 758 GB/s=9.08077G/s PrefixSumBench/LocalIntAVX512/64 23 ns 23 ns 24876551 GB/s=22.0697G/s PrefixSumBench/LocalIntAVX512/256 95 ns 95 ns 7387386 GB/s=21.556G/s PrefixSumBench/LocalIntAVX512/1024 435 ns 435 ns 1609682 GB/s=18.8425G/s PrefixSumBench/LocalIntAVX512/4096 1815 ns 1815 ns 385462 GB/s=18.0561G/s PrefixSumBench/LocalIntAVX512/16384 7479 ns 7476 ns 93660 GB/s=17.5335G/s PrefixSumBench/LocalIntAVX512/65536 30171 ns 29879 ns 23430 GB/s=17.5468G/s PrefixSumBench/LocalIntAVX512/262144 125805 ns 125631 ns 5570 GB/s=16.6929G/s PrefixSumBench/LocalIntAVX512/1048576 504216 ns 503983 ns 1384 GB/s=16.6446G/s PrefixSumBench/ExclusiveScanIntAVX512/64 23 ns 23 ns 30058295 PrefixSumBench/ExclusiveScanIntAVX512/256 101 ns 101 ns 7398498 PrefixSumBench/ExclusiveScanIntAVX512/1024 435 ns 434 ns 1403877 PrefixSumBench/ExclusiveScanIntAVX512/4096 1979 ns 1978 ns 354016 PrefixSumBench/ExclusiveScanIntAVX512/16384 7828 ns 7819 ns 89551 PrefixSumBench/ExclusiveScanIntAVX512/65536 31206 ns 31192 ns 22408 PrefixSumBench/ExclusiveScanIntAVX512/262144 130106 ns 130023 ns 5388 PrefixSumBench/ExclusiveScanIntAVX512/1048576 525515 ns 524976 ns 1244 ``` Reviewed By: navahgar, swolchok Differential Revision: D32011740 fbshipit-source-id: 7962de710bd588291dd6bf0c719f579c55f7c063	2021-11-04 14:00:19 -07:00
Philip Meier	641ba36a4e	fix annotation for Demultiplexer (#65998 ) Summary: cc SsnL VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/65998 Reviewed By: bdhirsh Differential Revision: D32145926 Pulled By: ejguan fbshipit-source-id: 60be3126fb9e73b8631b5040676264504e926707	2021-11-04 13:44:02 -07:00
Thomas J. Fan	da59bd1d13	TST Adds device transfer into module info tests (#65488 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 This PR adds device to device transfer test into `ModuleInfo`. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/65488 Reviewed By: mruberry Differential Revision: D32063662 Pulled By: jbschlosser fbshipit-source-id: 0868235a0ae7e5b6a3e4057c23fe70753c0946d2	2021-11-04 12:50:33 -07:00
Natalia Gimelshein	3d4a6ff15d	Revert D32154788: Move Concat Linear out of Optimize Numerics Test Plan: revert-hammer Differential Revision: D32154788 (`ea94dde573`) Original commit changeset: faa6465c89b3 fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33	2021-11-04 12:20:02 -07:00
Natalia Gimelshein	86aea79217	Revert D32154786: Fix Freezing Docs Parameters Test Plan: revert-hammer Differential Revision: D32154786 (`db15a7c0b3`) Original commit changeset: d8a2b4f39ff4 fbshipit-source-id: 657e3974a8e0ca71790adc1b031a87b7c497ea25	2021-11-04 12:20:00 -07:00
Natalia Gimelshein	279af1a668	Revert D32154787: Formatted with Black Test Plan: revert-hammer Differential Revision: D32154787 (`08d630b9a6`) Original commit changeset: 6a95691c4ad9 fbshipit-source-id: 2dbcf2395071433731683f685a0351fa8604d620	2021-11-04 12:18:37 -07:00
John Clow	08d630b9a6	Formatted with Black (#67792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67792 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154787 Pulled By: Gamrix fbshipit-source-id: 6a95691c4ad9d997071bb4ffc00b5eab30f90b81	2021-11-04 11:32:26 -07:00
John Clow	db15a7c0b3	Fix Freezing Docs Parameters (#67201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67201 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154786 Pulled By: Gamrix fbshipit-source-id: d8a2b4f39ff477f5131c02fe8c0b1a25339ce158	2021-11-04 11:32:24 -07:00
John Clow	ea94dde573	Move Concat Linear out of Optimize Numerics (#67196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154788 Pulled By: Gamrix fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88	2021-11-04 11:30:39 -07:00
Bo Tan	6f0a1f2b8d	Only set sccache_epilogue to run on build job exits (#67798 ) Summary: Fixes: * https://github.com/pytorch/pytorch/issues/65431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67798 Reviewed By: malfet Differential Revision: D32174810 Pulled By: boyuantan fbshipit-source-id: 072fdc042b56e541a877074120d41645c98e41f5	2021-11-04 11:11:02 -07:00
Eli Uriegas	618bab593c	.github: Output expected vs. actual (#67703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67703 Had this script fail on me within CI without actually telling me what was wrong so adding some more output here to showcase what the actual vs. the expected result is Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32112898 Pulled By: seemethere fbshipit-source-id: dfc9a82c709d52e0dde02d1e99a19eecc63c5836	2021-11-04 11:02:43 -07:00
Rohan Varma	90d311b268	[RPC] Add exception logging to constValue() (#67802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802 In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is. This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging. ghstack-source-id: 142375391 Test Plan: Added UT Reviewed By: mrshenli Differential Revision: D32156552 fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b	2021-11-04 10:04:09 -07:00
Masaki Kozuki	7c739e1ab9	Resubmit #67161 (#67735 ) Summary: Skip building extensions if windows following https://github.com/pytorch/pytorch/pull/67161#issuecomment-958062611 Related issue: https://github.com/pytorch/pytorch/issues/67073 cc ngimel xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67735 Reviewed By: bdhirsh Differential Revision: D32141250 Pulled By: ngimel fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe	2021-11-04 09:59:30 -07:00
Jane Xu	8b0c2c18eb	Fix pretrained=True for test_pt_onnx_trt (#67818 ) Summary: Addresses https://github.com/pytorch/pytorch/pull/66312#issuecomment-960357403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67818 Reviewed By: malfet Differential Revision: D32161208 Pulled By: janeyx99 fbshipit-source-id: 076e52ddc8718c74eb2941e867d92bfa4fe70f80	2021-11-04 09:49:42 -07:00
Eddie Yan	af1bd88fc4	Allow scalars for aliased binary ops {`multiply`, `subtract`, `divide`} (#65937 ) Summary: https://github.com/pytorch/pytorch/issues/65868 pointed out that the "long-form" versions of some binary ops like `mul`, `sub`, and `div` don't match their alias's behavior when it comes to handling scalar inputs. This PR adds the missing registration in `python_arg_parser.cpp` to resolve this. CC ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65937 Reviewed By: malfet Differential Revision: D32156580 Pulled By: ngimel fbshipit-source-id: b143cf7119a8bb51609e1b8734204edb750f0210	2021-11-04 09:36:45 -07:00
Rohan Varma	bd8feb33d4	Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801 ) Summary: Running one test in test_distributed_spawn is a bit confusing but possible. Add documentation to the CONTRIBUTING.md for this. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67801 Reviewed By: mrshenli Differential Revision: D32157700 Pulled By: rohan-varma fbshipit-source-id: a1d10f2fb5f169b46c6d15149bf949082d9bd200	2021-11-04 08:54:31 -07:00
Peter Bell	4262c8913c	Remove native_functions.yaml dependency from TensorTopK.cu (#66794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66794 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31856104 Pulled By: dagitses fbshipit-source-id: 2b9c0e1072455c5019c6f681faa3de848b3dae46	2021-11-04 08:32:06 -07:00
Peter Bell	927da4d32f	Remove native_functions.yaml dependency from Sort.cu (#66793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66793 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31856100 Pulled By: dagitses fbshipit-source-id: 1469ce1deb4124f2a9e160a8e3298d56ac3f6561	2021-11-04 08:30:40 -07:00
Facebook Community Bot	61ed9285dd	Automated submodule update: tensorpipe (#67845 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `d2aa3485e8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67845 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32170821 fbshipit-source-id: 1958e824a9f02c5178fa5d4a73a171dedefc540c	2021-11-04 08:24:05 -07:00
Howard Huang	cfd998c197	Remove ProcessGroup RPC backend placeholder as part of 1.11 (#67363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67363 ProcessGroup RPC backend is deprecated. In 1.10 it would throw an error to the user to be more user friendly. This PR now removes it completely. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D32138321 Pulled By: H-Huang fbshipit-source-id: b4f700d8f1b1d46ada7b5062d3f754646571ea90	2021-11-04 07:57:58 -07:00
lezcano	8e1ead8e4d	Fix the kl_div docs (#67443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67443 Fixes https://github.com/pytorch/pytorch/issues/57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D32136888 Pulled By: jbschlosser fbshipit-source-id: 1ad0a606948656b44ff7d2a701d995c75875e671	2021-11-04 07:09:08 -07:00
Facebook Community Bot	04fe4382ec	Automated submodule update: tensorpipe (#67769 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `caa2ccb394` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67769 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32138256 fbshipit-source-id: dfe4c73ae25c8f362f2917dd7594bdcd418c2a0d	2021-11-04 01:13:19 -07:00
Mike Ruberry	b8d365ca3a	ci fix (#67826 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67826 Reviewed By: Chillee Differential Revision: D32164770 Pulled By: mruberry fbshipit-source-id: c1de7e6db6d0cb1761388f1ea0178dbff3fe6dc8	2021-11-04 00:16:47 -07:00
Bin Wen	1baed45c6b	[fbcode][static runtime] out-variant for quantized::linear_dynamic_fp16 (#67663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67663 mostly follow the example of quantized::linear (D28428734 (`4d7abdbdad`)) to enable out-variant for quantized::linear_dynamic_fp16. Reason being from MP tab ctr pytorch model migration, we observe quantized::linear_dynamic_fp16 operator has highest cost but not enable out-variant yet https://fburl.com/phabricator/b5juus2d Test Plan: buck build mode/opt caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench sudo watch -n 20 /usr/local/fbprojects/dynamoserver/bin/turboDriver disable MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench -- --scripted_model=/home/bwen/models/991103061_4/991103061_4.predictor --pt_inputs=/home/bwen/models/991103061_4/pt_inputs --method_name=forward --pt_cleanup_activations=1 --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=1000 --num_threads=1 --repetitions=3 --do_profile=1 --do_benchmark=1 --set_compatibility=1 --compare_results=1 --pt_enable_static_runtime 2>&1 \| pastry before: P465201159 0.929067 ms. 31.808%. quantized::linear_dynamic_fp16 (16 nodes) 0.921679 ms. 31.7324%. quantized::linear_dynamic_fp16 (16 nodes) 0.919127 ms. 31.7404%. quantized::linear_dynamic_fp16 (16 nodes) after: P465203015 0.90898 ms. 31.0205%. quantized::linear_dynamic_fp16 (16 nodes, out variant) 0.9127 ms. 30.62%. quantized::linear_dynamic_fp16 (16 nodes, out variant) 0.879148 ms. 31.0161%. quantized::linear_dynamic_fp16 (16 nodes, out variant) unit test logic refers https://fburl.com/code/vv0rry13 buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D32001168 fbshipit-source-id: 873d9f77434b9c4bafb298c871173f9a560dd2a3	2021-11-03 22:39:04 -07:00
Natalia Gimelshein	99c7a9f09d	fix bfloat16 autocast skip (#67822 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67822 Reviewed By: mruberry Differential Revision: D32162605 Pulled By: ngimel fbshipit-source-id: eb5ccf6c441231e572ec93ac8c2638d028abecad	2021-11-03 21:02:37 -07:00
Elias Ellison	2486061c72	[JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688 ) Summary: Some of the "no-ops" are not actually no-ops because they can change the dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688 Reviewed By: davidberard98 Differential Revision: D32104601 Pulled By: eellison fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21	2021-11-03 20:11:46 -07:00
Jane Xu	88d86de7d8	Add lint to ensure all test files have headers with ownership info (#66826 ) Summary: UPDATE: CI should be green now with the added files. This should fail for now, but will pass when all action for https://github.com/pytorch/pytorch/issues/66232 is done. Example failure run: https://github.com/pytorch/pytorch/runs/4052881947?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/66826 Reviewed By: seemethere Differential Revision: D32087209 Pulled By: janeyx99 fbshipit-source-id: ad4b51e46de54f23aebacd592ee67577869f8bb6	2021-11-03 18:21:49 -07:00
Junjie Wang	2766662ca9	[PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188 This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor. We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment. Several caveats: 1. Only the sharding of one weight is supported now. 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. Some other changes include: 1. Refactor the ShardedEmbedding code so that the common logic can be reused. 2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2) ghstack-source-id: 142325915 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31749458 fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b	2021-11-03 17:39:18 -07:00
Rohan Varma	fd77fff0b1	[FSDP] customizable backend in test (#67135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67135 Add ability to use env var backend for quicker testing (and gloo2 in the future) ghstack-source-id: 142274304 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31878285 fbshipit-source-id: 80ae7107cd631a1a15ebc23262b27d8192cfe4b6	2021-11-03 15:45:52 -07:00
soulitzer	83e8612d11	Clean up test autograd (#67413 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/66066 This PR: - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality - tests related to an operator are better colocated - see the tracker for details What to think about when moving tests to their correct test suite: - naming, make sure its not too generic - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter - can this be merged with existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413 Reviewed By: jbschlosser, albanD Differential Revision: D32031480 Pulled By: soulitzer fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f	2021-11-03 15:26:09 -07:00
Natalia Gimelshein	ca445645f9	Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel Test Plan: revert-hammer Differential Revision: D31902471 (`15a3c374e2`) Original commit changeset: d2729a38ba1a fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19	2021-11-03 14:48:12 -07:00
Tao Xu	603116a6ae	[Core ML][easy] Assign missing properties to the executor (#67737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67737 As title says ghstack-source-id: 142277212 Test Plan: - buck test pp-ios - circleci Reviewed By: hanton Differential Revision: D32123661 fbshipit-source-id: eff3068669f8fdc573dc81b04bcc20ef153d8c4a	2021-11-03 14:15:53 -07:00
Yusuo Hu	fddfb81dd0	Add BF16 type to _autocast_to_full_precision (#67707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67707 https://github.com/pytorch/pytorch/pull/63939/files has added FP16 support to torchscript. This is to add BF16 device type when doing full conversion. Test Plan: Unit test. Also tested BF16 locally on A100 using MLP model. Reviewed By: idning Differential Revision: D32027152 fbshipit-source-id: b2a5ff2b22ea1e02306b0399f2b39b8493be4f45	2021-11-03 14:06:50 -07:00
Pritam Damania	05e17e7ff6	Add API usage logging for several other RPC APIs. (#67722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67722 ghstack-source-id: 142259452 Test Plan: waitforbuildbot Reviewed By: jaceyca, fduwjj Differential Revision: D32118872 fbshipit-source-id: 041ab5601221b1846c56ce4bb63364bec9ad28b0	2021-11-03 14:02:00 -07:00
Michael Suo	5fd93fb5f8	broaden retries on TestHub (#67779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67779 Not all flaky failures from this test are URLErrors; I think we should err on the side of being expansive with retries here. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D32145434 Pulled By: suo fbshipit-source-id: 3c3274b2080681fcafb3ea6132e420605f65c429	2021-11-03 13:48:58 -07:00
Hao Lu	89b02fc70b	[StaticRuntime][Easy] Correct typos in test_static_runtime (#67739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67739 Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: mikeiovine Differential Revision: D32125879 fbshipit-source-id: bd989e5088edff87624b858bd9045dfe9da3fbe7	2021-11-03 13:24:46 -07:00
Peter Bell	4d601a1c36	codegen: Split up source, header and Declarations.yaml generation (#67497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67497 This allows more of the code-generation to happen in parallel, whereas previously all codegen was serialized. Test Plan: Imported from OSS Reviewed By: dagitses, mruberry Differential Revision: D32027250 Pulled By: albanD fbshipit-source-id: 6407c4c3e25ad15d542aa73da6ded6a309c8eb6a	2021-11-03 13:20:54 -07:00
Peter Bell	fe91906ad7	Remove Declarations.yaml dependency from gen_autograd (#67496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496 gen_autograd.py doesn't use `Declarations.yaml` any more, and removing the dependency allows it to run in parallel with `tools/codegen/gen.py`. Test Plan: Imported from OSS Reviewed By: dagitses, ejguan Differential Revision: D32027251 Pulled By: albanD fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703	2021-11-03 13:19:24 -07:00
Mike Iovine	9b1caca185	[SR] Macro to clean up c10::Symbol maps in passes (#67484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67484 Maps from `c10::Symbol -> c10::Symbol` can be hard to parse when `fromQualString` is scattered everywhere. I've been annoyed by this issue many times when rebasing, and have even messed up `FuseListUnpack` a few times. Introduce a macro to make it easier to see what maps to what. Test Plan: CI Reviewed By: hlu1 Differential Revision: D32004451 fbshipit-source-id: 1086254c8403a0880d014512c439edbefa6fa015	2021-11-03 12:57:07 -07:00
Mike Iovine	0eaa01ead1	[SR] Add EliminateTrivialEquallySplit graph pass (#67166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67166 This optimization is not really the same thing as `FuseListUnpack`, and mixing the logic in that pass is confusing and error-prone. It should really be its own pass. It's slower since we have to do another pass over the graph, but this is not perf critical code; readability is more important. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D31887458 fbshipit-source-id: 289e281d512435861fccfe19f017751ad015688c	2021-11-03 12:57:05 -07:00
Xu Zhao	6cc6a5fd9d	Fix a bug in TorchBench ondemand CI. (#67743 ) Summary: Use the main branch when TorchBench branch is not specified. RUN_TORCHBENCH: soft_actor_critic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67743 Reviewed By: seemethere Differential Revision: D32142663 Pulled By: xuzhao9 fbshipit-source-id: 160227835543b8e55c970025073839bf0f03aa81	2021-11-03 12:55:52 -07:00
Charles David Hernandez	f455030931	Adding a docstring for memoryless in observer args (#67690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690 see title [skip ci] Test Plan: python setup.py develop Imported from OSS Reviewed By: ejguan Differential Revision: D32107512 fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e	2021-11-03 12:46:44 -07:00
Natalia Gimelshein	98be5216e2	Revert D32104006: [pytorch][PR] Added forward derivatives for neg, diag, inverse, linalg_eig Test Plan: revert-hammer Differential Revision: D32104006 (`88c61b8d06`) Original commit changeset: 1f6ace09ee3e fbshipit-source-id: f9f950b4177e1fe29b9059f4b5dfb9c8c67f479a	2021-11-03 12:40:00 -07:00
Michael Suo	6df0d7d502	[lint] add basic lintrunner compatibility (#67110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67110 Adds support for using lintrunner with: - clang-format - clang-tidy - flake8 - mypy Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D32145555 Pulled By: suo fbshipit-source-id: 2150348e26fba4ae738cd0b9684b2889ce0f1133	2021-11-03 12:35:28 -07:00
Shashank Chaudhry	89c4e8c22b	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D31986646 fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8	2021-11-03 12:23:14 -07:00
Eddie Yan	a5b57c9433	Avoid prematurely casting GEMM parameters `alpha`, `beta` to `scalar_t` (#67633 ) Summary: stas00 uncovered an issue where certain half-precision GEMMs would produce outputs that looked like the result of strange rounding behavior (e.g., `10008.` in place of `10000.`). ptrblck suspected that this was due to the parameters being downcasted to the input types (which would reproduce the problematic output). Indeed, the GEMM and BGEMM cublas wrappers are currently converting the `alpha` and `beta` parameters to `scalar_t` (which potentially is reduced precision) before converting them back to `float`. This PR changes the "ARGTYPE" wrappers to use `acc_t` instead and adds a corresponding test. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/67633 Reviewed By: mruberry Differential Revision: D32076474 Pulled By: ngimel fbshipit-source-id: 2540d9b9d0195c17d07d1161374fb6a5850779d5	2021-11-03 12:01:04 -07:00
Eli Uriegas	3f33ada8d5	.github: Forward fix generating GHA workflows (#67777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67777 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32143785 Pulled By: seemethere fbshipit-source-id: fb129244bdd46ffda05ed51b16183395152d7296	2021-11-03 11:36:27 -07:00
Raghavan Raman	15a3c374e2	[nnc] Add support for dynamic shapes in TensorExprKernel (#67197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197 Test Plan: Imported from OSS Reviewed By: eellison, ZolotukhinM Differential Revision: D31902471 Pulled By: navahgar fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc	2021-11-03 11:24:17 -07:00
Matthias Reis	88c61b8d06	Added forward derivatives for neg, diag, inverse, linalg_eig (#67339 ) Summary: See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67339 Reviewed By: ejguan Differential Revision: D32104006 Pulled By: albanD fbshipit-source-id: 1f6ace09ee3e737b99520543b30550601809ceb5	2021-11-03 11:21:54 -07:00
Jane Xu	a23814577b	Overload TestCase not vanilla TestCase for some elastic tests (#67700 ) Summary: Addresses a bit of https://github.com/pytorch/pytorch/issues/66903 Fixes it so that https://github.com/pytorch/pytorch/issues/66207 can be properly disabled cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67700 Reviewed By: H-Huang Differential Revision: D32116908 Pulled By: janeyx99 fbshipit-source-id: 205ff68a7408609cfced2357fd99f41949ef6390	2021-11-03 11:14:52 -07:00
neerajprad	201f7d330a	Remove duplicate check in distributions arg validation (#67741 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/66800. (Duplicate of https://github.com/pytorch/pytorch/issues/67725 against pytorch/pytorch so as to trigger TorchBench) https://github.com/pytorch/pytorch/issues/61056 added a more verbose error message for distributions failing argument validation. However, it did not replace the earlier error check as was originally intended and was flagged by xuzhao9 as being the potential cause of a perf regression in `test_eval[soft_actor_critic-cuda-eager]`. xuzhao9: Is there a way for me to check if this resolves the perf issue you mentioned? cc VitalyFedyunin ngimel Note that existing tests already check for the error message and should verify that the removed lines are redundant. RUN_TORCHBENCH: soft_actor_critic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67741 Reviewed By: neerajprad Differential Revision: D32135675 Pulled By: xuzhao9 fbshipit-source-id: 37dfd3ff53b95017c763371979ab3a2c302a72b9	2021-11-03 10:41:41 -07:00
Slava Kovalevskyi	1ffd43cf0c	generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit migrated to GHA (#67695 ) Summary: in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes: * generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit deleted from circle * pytorch_android_gradle_custom_build_single removed since it is no longer used * generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit added to GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/67695 Reviewed By: malfet, seemethere, ejguan Differential Revision: D32115620 Pulled By: b0noI fbshipit-source-id: 113d48303c090303ae13512819bac2f069a2913f	2021-11-03 10:29:37 -07:00
Shiyan Deng	4a106e41e9	[fx2trt] Add torch.nn.function.pad support for fx2trt (#67498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67498 Add acc_ops.pad and a converter for it. We want to try padding convolution channel dimension to get better int8 performance. This one only support padding the last two dimension though. Starting from 8.2, it's suggested to use Slice layer to do padding but this might be nice to have for old version support. Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad Reviewed By: wushirong Differential Revision: D32006072 fbshipit-source-id: 96c3aa2aec2d28345d044a88bee2f46aba5cca0e	2021-11-03 10:21:08 -07:00
Raghavan Raman	383c1f51b1	[nnc] Fixed handling of 0-sized tensors in cat (#67734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734 The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension. Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'` Reviewed By: ZolotukhinM Differential Revision: D32122171 fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655	2021-11-03 10:16:16 -07:00
Xiao Wang	31cf3d6aad	Fix adaptive_max_pool2d for channels-last on CUDA (#67697 ) Summary: Fix https://github.com/pytorch/pytorch/issues/67239 The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697 Reviewed By: ejguan Differential Revision: D32112443 Pulled By: ngimel fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe	2021-11-03 09:47:29 -07:00
Mikhail Zolotukhin	ff5c61a74e	[TensorExpr] Add lowering for aten::max (reduction). (#66519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519 Differential Revision: D31590853 D31590853 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170	2021-11-03 09:44:09 -07:00
Mikhail Zolotukhin	00afe9ba7b	[TensorExpr] Add lowering for aten::embedding. (#66518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518 Differential Revision: D31590855 D31590855 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64	2021-11-03 09:44:07 -07:00
Mikhail Zolotukhin	008a58d226	[TensorExpr] Add lowering for aten::conv1d. (#66517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517 Differential Revision: D31590856 D31590856 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0	2021-11-03 09:44:05 -07:00
Mikhail Zolotukhin	d58ef2bbff	[TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516 Differential Revision: D31590858 D31590858 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1	2021-11-03 09:42:48 -07:00
Gordon Fossum	ea4d983885	Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831 ) Summary: OpenBLAS recently added support for bfloat16 GEMM, so this change has PyTorch call out to OpenBLAS for that, like it does for single and double precision Our goal is to try to enable PyTorch to make calls to "sbgemm" in OpenBLAS. We are prepared (if it is your preference) to add fences to the code to limit this change to the Power architecture, but our first instinct is that anyone on any architecture that enables access to sbgemm in their OpenBLAS library should be able to use this code. (but again, we respect that as we are just starting to modify PyTorch, we respect your guidance!) (there is no issue number related to this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58831 Reviewed By: albanD Differential Revision: D29951900 Pulled By: malfet fbshipit-source-id: 3d0a4a638ac95b2ff2e9f6d08827772e28d397c3	2021-11-03 08:53:27 -07:00
Richard Zou	05d1dcc14c	Split channels_last test cases for tensor conversion OpInfos (#67368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67368 This PR adds an addition test variant for the tensor conversion functions (bfloat16, char, long, ...) that tests channels_last. This is because some backends (mostly just functorch right now) don't have channels last handling and may want to test that separately from the more general case of these operations. Test Plan: - wait for tests Reviewed By: mruberry Differential Revision: D31972959 Pulled By: zou3519 fbshipit-source-id: 68fea46908b2cdfeb0607908898bb8f9ef25b264	2021-11-03 07:39:41 -07:00
Vasiliy Kuznetsov	92a85ecbab	add a quantized hardsigmoid inplace variant (#65740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65740 fp32 hardsigmoid supports inplace. This PR adds the inplace support to the quantized hardsigmoid function, to make the signatures match. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qhardsigmoid ``` Reviewed By: supriyar Differential Revision: D31992282 Pulled By: vkuzo fbshipit-source-id: f6be65d72954ab8926b36bb74a5e79d422fbac90	2021-11-03 07:35:31 -07:00
Matt Galloway	e32d7f7525	ATen \| Fix potential crash if `MTLCreateSystemDefaultDevice` return nil (#66859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66859 `MTLCreateSystemDefaultDevice` can return `nil`. If that happens then inside `createDeviceInfo`, we'll crash trying to convert the `nullptr` from `device.name.UTF8String` into a `std::string`. Let's fix it by returning early in setup if there's no Metal device. But also make `createDeviceInfo` safe if we do pass in `nil`. Test Plan: * CircleCI Reviewed By: xta0 Differential Revision: D31759690 fbshipit-source-id: 74e878ab5b8611250c4843260f1d2e4eab22cdaf	2021-11-03 03:03:45 -07:00
Scott Wolchok	510336499b	[PyTorch][Static Runtime] Separate overlap checks for easier debugging (#66637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66637 We can give more information when verify_no_memory_overlap would fail by separating the DCHECK. ghstack-source-id: 142226105 Test Plan: fitsships Reviewed By: d1jang Differential Revision: D31517151 fbshipit-source-id: 8cbc324c27f6b4db4489d1bd469d37b1d8ae6ce1	2021-11-02 23:59:04 -07:00
Nikolay Korovaiko	3db536e55e	add jit_trace_module python binding (#67425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998564 Pulled By: Krovatkin fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4	2021-11-02 23:55:23 -07:00
Nikolay Korovaiko	a8757cdd70	type inputs (#67424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67424 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998565 Pulled By: Krovatkin fbshipit-source-id: 8a2b8b3f13a361fe8fce7c7c930bbfd357ef8ac1	2021-11-02 23:55:21 -07:00
Nikolay Korovaiko	d352587210	add a few convenience helpers to removeAllXXX to Block and Node (#67423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67423 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998566 Pulled By: Krovatkin fbshipit-source-id: ed435d5c35e44ab2676c47b43d6e2aa8e79d9ab2	2021-11-02 23:54:02 -07:00
Rohan Varma	7f3326a6d2	[FSDP] CPU offload resubmit (#67249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67249 Implements CPU offload for model parameters in FSDP. - CPU offload class with only offload_params attribute is created - If this is specified in FSDP ctor, model parameters are moved back to CPU after sharding in __init__ - In forward pass, during lazy init, p._local_shard gets set to p.data so it is on CPU. We pin_memory here. - In forward pass, in _rebuild_full_params, we move p.data back to self.compute_device if necessary. Note that we don't use the device of p._full_param_padded because we don't always have this attr, but when we do its always the same as compute_device. - The same logic as above applies to the beginning of backwards pass. - At end of fwd and end of bwd, `_use_param_local_shard` takes care to ensure the parameters are offloaded to CPU again, by pointing it to p._local_shard, which is always on CPU. Regarding tests: - We tests 3 different types of init: 1) CUDA the model before wrapping with FSDP, 2) CUDA the model after wrapping with FSDP, 3) never CUDA the model. - Case 1 is always supported. Case 2 is not supported with CPU offload and throws an error during fwd pass. Case 3 is only supported with CPU offload at the moment. - Verifies all params are offloaded to CPU after init. - Verifies all params are offloaded to CPU after forward and backward. - Note that there is an issue with verifying exact parity when CPU offloading, but it appears to be related to transfering model back and forth cpu/CUDA. More details in https://github.com/pytorch/pytorch/pull/66961 ghstack-source-id: 141851903 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31911085 fbshipit-source-id: 3ddf73c070b55ce383e62251868d609004fc30e7	2021-11-02 23:27:34 -07:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
Don Jang	e86a5a3a1a	[Static Runtime] Add PyTorchPredictor::predict_managed_result to return managed output tensors (#65598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65598 This change adds `PyTorchPredictor::predict_managed_result` to enable Static Runtime to return managed output tensors, allocated and owned by Static Runtime to accelerate inference workloads. - `PyTorchPredictor::predict_managed_result` does only meaningful work for the overridden `PyTorchStaticRuntimePredictor::predict_managed_result`. For other subclasses, it returns a simple object that just wraps the returned `Ivalue`. - When `manage_output_tensors` is enabled, a `StaticRuntime` cannot be reentered until its return value gets deallocated by calling `StaticRuntime::deallocateOutputTensors`. Currently an instance of `StaticRuntime` gets immediately pushed back to `static_runtime_pool` to be reentered again, and this cannot be done when `manage_output_tensors` is enabled. `PyTorchStaticRuntimePredictorManagedResult` makes sure to delay pushing a `StaticRuntime` instance back to the pool only after `StaticRuntime::deallocateOutputTensors` is called on the runtime instance. - When `manage_output_tensors` is enabled, `PyTorchStaticRuntimePredictor::predict_managed_result` returns the prediction result, whose backing memory is managed by an instance of `StaticRuntime`. The lifetime of any value reachable from `PyTorchStaticRuntimePredictorManagedResult.get()` is expected to end before `PyTorchStaticRuntimePredictorManagedResult` gets destructed. As explained above, `PyTorchPredictorManagedResult`'s destruction pushes the runtime instance that returned the result back to `static_runtime_pool` to be reused again. - The current API design of adding `predict_managed_result` instead of forcing `operator()` to return `PyTorchPredictorManagedResult` was motivated by the fact that `manage_output_tensors` will be selectively enabled just for a few models. In case `manage_output_tensors` becomes a commonly used feature we should revisit this API design to merge them together. Reviewed By: hlu1 Differential Revision: D31149323 fbshipit-source-id: 5ca026188077232d6a49a46759124a978439d7b2	2021-11-02 22:10:26 -07:00
Shen Li	18955d3564	Raise warning when calling collectives on non-member group objects (#67639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67639 Due to BC considerations, we cannot directly error out, as that might break existing applications. Raise warnings first to improve debuggability. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32075151 Pulled By: mrshenli fbshipit-source-id: 5680d420f5f6cd3f74a36616c03350e8a976b363	2021-11-02 20:04:07 -07:00
Jerry Zhang	54241a9cfa	[quant][fx] Add support for fused modules in _convert_do_not_use (#67245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67245 Add support for fused modules in the new convert path, including linear-relu, conv{1-3}d-relu and their qat versions, also tested with trt (conv2d-relu and linear-relu) Test Plan: ``` python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_linear_relu_module python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_conv_relu_module ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D31919724 fbshipit-source-id: 7e5c96eba30706f7989da680aa3443159847bdfd	2021-11-02 19:21:54 -07:00
Nikita Shulga	91971dfc2a	[BE] [GHA] Use `aws ecr get-login-password` (#67709 ) Summary: Replacing `aws ecr get-login` with `awc ecr get-login-password`, per https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration.html#cliv2-migration-ecr-get-login Follow up after the similar change in CircleCI: https://github.com/pytorch/pytorch/pull/58308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67709 Reviewed By: seemethere, janeyx99 Differential Revision: D32119319 Pulled By: malfet fbshipit-source-id: 0cd0d8f4d81e9981a5f8fbf9b812a9167fd48135	2021-11-02 19:06:50 -07:00
Joe Early	16ee6409ee	Changed value constraint of exponential dist (#67184 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67183. cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/67184 Reviewed By: ejguan Differential Revision: D32114661 Pulled By: neerajprad fbshipit-source-id: ea23e59f38a23a7b0bab4fbbd98ae3feba468b9c	2021-11-02 17:44:56 -07:00
Rohan Varma	885da61d7d	[PG NCCL] Disable NCCL health check (#67668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668 This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details. Test Plan: CI Reviewed By: yuguo68, mrshenli Differential Revision: D32089763 fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb	2021-11-02 16:21:59 -07:00
Horace He	0b2f68eadf	Remove special FX OpInfo list (#67520 ) Summary: Most of the failing tests are since the test doesn't work with python functions (only builtins like `torch.add`). I added a check for that and ported the remaining skips into the `skips` field. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67520 Reviewed By: ZolotukhinM Differential Revision: D32046856 Pulled By: Chillee fbshipit-source-id: 05fa3e3c40fa6cc4f776e0c24f667629b14afd25	2021-11-02 16:01:46 -07:00
Peter Bell	96e3d1a76c	Remove native_functions.yaml dependency from Sorting.cu (#66621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66621 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31856099 Pulled By: dagitses fbshipit-source-id: d9c2b6b45099e49c7beaae5888140de350d23696	2021-11-02 14:46:29 -07:00
Peter Bell	7deb1726ea	Remove native_functions.yaml dependency from ScanKernels.cu (#66620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66620 This splits the Tensor-dependant code out into a cpp file. A slight complicating factor is `scan_dim` using `copy_` to handle non-contiguous out arguments. So, I've moved that code into the caller which does introduce some duplication. Though it's only ~10 lines extra in total. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31856106 Pulled By: dagitses fbshipit-source-id: 91bb4ce5e7c6487e3ea0d5ec4d9f7a625d8ef978	2021-11-02 14:45:17 -07:00
Eli Uriegas	9e97ccbd7a	.github: Migrate iOS workflows to GHA (#67645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67645 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32104367 Pulled By: seemethere fbshipit-source-id: 08ff043ed5d0b434322f1f3f20dce2a4f5fa88c1	2021-11-02 14:38:43 -07:00
Salil Desai	a831713786	[PyTorch Edge] Use Integer Subtraction (Instead of Float) in Non-FBGEMM Dequantization (#67115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67115 This matches what FBGEMM does (https://fburl.com/code/vjrdn6tj → https://fburl.com/code/btkdn24l) Benchmark Mobile Vision Transformer Model Results (as described in D31066997 and config from rebasing onto v4 of D31869106): This diff (v18): - NET latency: 109.866 - https://our.intern.facebook.com/intern/aibench/details/536304563225483 This diff before using vsubl (v14 but rebased onto v22 of D31205883, the previous diff in this stack) - NET latency: 115.887 - https://our.intern.facebook.com/intern/aibench/details/906978557243297 Before this diff (v22 of D31205883): - NET latency: 116.449 - https://our.intern.facebook.com/intern/aibench/details/870678436773989 ghstack-source-id: 142166375 Test Plan: Phabricator tests + Running quantized_test on a pixel3a passes and Running mobile vision transformer model (as described in D31066997) both work Reviewed By: kimishpatel Differential Revision: D31483135 fbshipit-source-id: fbef00cad6087b49900d21c3dd3b6fd432f64e94	2021-11-02 14:28:03 -07:00
Salil Desai	23bd3cf5b2	[PyTorch Edge] Parallelize Quantize and Dequantize Tensor (#65845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65845 Benchmarking of Non-Parallelized and Parallelized quantization/dequantization for various devices and input sizes done in this notebook: https://www.internalfb.com/intern/anp/view/?id=1204834&scroll_cell=17&checkpoint_id=432447238302644 For example: - {F671713127} - {F671713209} - {F671713238} - {F671713253} When run on Partially Quantized Mobile Vision Transformer Model (as described in D31066997: Before this diff (on D31444248 v7): - [120.907ms](https://our.intern.facebook.com/intern/aibench/details/945891590820680) With this diff (v19): - Threshold = 2^16: [118.086ms](https://our.intern.facebook.com/intern/aibench/details/436376817372377) - Threshold = 2^20: [118.361ms](https://our.intern.facebook.com/intern/aibench/details/617543354077290) ghstack-source-id: 142166374 Test Plan: Same as previous diff (D31066997) All tests pass Also, set numel to 2^21 in quantized_test TestArmVectorizedAndParallelQuantizeDequantize (https://www.internalfb.com/diff/D31066997?dst_version_fbid=596325738080019&transaction_fbid=219437170135898) and the tests passed Reviewed By: kimishpatel Differential Revision: D31205883 fbshipit-source-id: 9ed0b11a376734feaf228074a24b8eb79d5270a3	2021-11-02 14:28:01 -07:00
Salil Desai	92cfda1785	[PyTorch Edge] Clean up Quantize Tensor code (#66220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66220 - Pass pointers rather than tensors to ```quantize_tensor_arm``` to allow for using ```__restrict__``` and to make parallelization easier (as in the next diff on this stack D31205883) - Replace ```auto``` with actual types - Replace raw cast with reinterpret_cast<...> - All of these changes make the code structure similar to that of Dequantize ghstack-source-id: 142166376 Test Plan: same as D31066997 (all tests pass) Reviewed By: kimishpatel Differential Revision: D31444248 fbshipit-source-id: 6a31d090082047263403f415911c199519987595	2021-11-02 14:27:59 -07:00
Salil Desai	16c62a6dc9	[PyTorch Edge] Optimize Dequantize Tensor with Intrinsics (#65844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65844 When run on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171), with config from rebasing onto v4 of D31869106 Before: [AIBench Run (128ms)](https://www.internalfb.com/intern/aibench/details/309792316534505) [Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881079420.html) After: [AIBench Run (117ms)](https://www.internalfb.com/intern/aibench/details/20433505461364) [Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881527831.html) Total events spent on at::native::dequantize_quantized reduced from 1.97 Billion to 0.97 Billion (~50% Reduction) ghstack-source-id: 142166373 Test Plan: To run quantized_test - Clone open source repo - Set ANDROID_NDK and ANDROID_SDK - Build with ```BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_LITE_INTERPRETER=0 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON``` - Move ```build_android/bin/quantized_test``` to devserver - Use one world to connect to android device (ex. ```one_world android device pixel-3a```) - In another terminal: Make quantized_test executable (```chmod +x quantized_test```), copy it to android device (```adb push quantized_test /data/local/tmp```), and run it (```adb shell /data/local/tmp/quantized_test```) Results: {F676102702} Also ```buck test mode/dev //caffe2/aten:quantized_test``` passes To test performance on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171) with AI Bench: - Save this config file: P466124028 (for example: D31869106) - Before or after the changes in this diff, run ```buck run aibench:run_bench -- -b benchmark_mobile_vision_transformer_model_config.json --platform android/arm64 --framework pytorch --remote --devices Pixel-3a-11-30 --force_profile``` Reviewed By: kimishpatel Differential Revision: D31066997 fbshipit-source-id: 9067e683e0181aa13a2b636b68ac4fe5a4b2e618	2021-11-02 14:26:42 -07:00
Shirong Wu	9cef2033f3	Modify decorator for acc op converters (#67636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67636 Modify decorator to denote whether a acc op converter is able to support explicit/implicit batch dim. This info will be used by trt_splitter when determine whether a node can be split into acc graph. This is can prevent us from split node to acc module and later found no proper converter exist for the node and fail the lower process. Test Plan: unit test Reviewed By: 842974287 Differential Revision: D31998477 fbshipit-source-id: 6789ebef4a76f9a0c1ab3edf8e846a5b6143326b	2021-11-02 13:35:40 -07:00
Sisil Mehta	5ad169b7cc	Adding in Wrap functions for FSDP from Fairscale (#67292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67292 as title Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/fsdp:wrap --keep-going Reviewed By: rohan-varma Differential Revision: D31936404 fbshipit-source-id: b7ebead9a649766aec83e5630c2ce1386ad33e11	2021-11-02 13:30:41 -07:00
Nikita Shulga	8f63cfda14	[LiteInterpreter] Specify `Loader` to `yaml.load` (#67694 ) Summary: It became a mandatory argument since PyYaml-6, but has been present since PyYaml-3 Unblock migration to newer runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/67694 Reviewed By: seemethere Differential Revision: D32106043 Pulled By: malfet fbshipit-source-id: 35246b97a974b168c066396ea31987b267534c7f	2021-11-02 12:52:57 -07:00
Sicheng Stephen Jia	b00206d473	[vulkan] Use 3D textures for everything (#67647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67647 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32102196 Pulled By: SS-JIA fbshipit-source-id: ded1835386a0640181f69c190a2294d298311e26	2021-11-02 12:29:26 -07:00
Mike Iovine	0ee8473af7	[SR][easy] Fix FuseListUnpack 0-use corner case (#67165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67165 We previously skipped the optimization if `value_out->uses().size() > 1`. But it's possible that the number of uses is 0. In that case, it's not safe to access `value_out->uses()[0]`. This is not causing any problems in production right now since we don't have any dead code before running this pass. But we should handle this case correctly to make the pass more robust. Test Plan: CI Reviewed By: hlu1 Differential Revision: D31887416 fbshipit-source-id: d30a5824e8bd1cda1debdc16524db3fb0da312f9	2021-11-02 12:17:16 -07:00
Michael Suo	6b1d8e5bb2	Revert D31861962: [qnnpack] Remove redundant fp16 dependency Test Plan: revert-hammer Differential Revision: D31861962 (`4061239fdd`) Original commit changeset: e1425c7dc3e6 fbshipit-source-id: 418f8173c19b9541316443e1ab4ec39062561b5e	2021-11-02 11:55:07 -07:00
Scott Wolchok	3e218dbd27	[PyTorch] Capture function args from schema by reference (#65951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65951 Profiling shows that we do a bunch of heap allocations to copy Argument structs in append_operator. Capturing by reference here should be safe as long as the schema objects args is from outlive the operator function. IMPORTANT: Reviewers (or automated tests if we're lucky) need to confirm that the above is true or we're going to have fun use-after-free bugs. ghstack-source-id: 142065422 Test Plan: AIBench run for speech model on MilanBoard control: https://www.internalfb.com/intern/aibench/details/485570882988661 (mean 906 ms) test: https://our.intern.facebook.com/intern/aibench/details/620835625995669 (mean 818 ms) So almost a 10% improvement in the wall time metric? Reviewed By: iseeyuan Differential Revision: D31319988 fbshipit-source-id: 7da56357420df500df344f49007e070ebb1bc581	2021-11-02 11:12:04 -07:00
Scott Wolchok	33d62266f2	[PyTorch][easy] Avoid allocating OperatorName strings in append_operator (#66134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66134 No reason to do the comparison the old way when we could do it this way and avoid copying into std::string. ghstack-source-id: 142065423 Test Plan: AIBench Milan run shows neutral to slight regression, but I think we should probably just make this change anyway. Reviewed By: dhruvbird Differential Revision: D31319669 fbshipit-source-id: dde329a4f2c4054f275eb98fb6556f5341e7533a	2021-11-02 11:10:52 -07:00
Mike Iovine	2644725937	[SR] Migrate gather_ranges_to_dense to new FuseListUnpack (#67164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67164 Migrated both the variadic and non-variadic versions. This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons: * You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native). * The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true. This diff finishes the migration, so the original version of `FuseListUnpack` is removed Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Accuracy Test Done at the top of this diff stack. Reviewed By: hlu1 Differential Revision: D31887386 fbshipit-source-id: 9d44c813667a75bce13dce62bf98e6109edea6ba	2021-11-02 11:04:59 -07:00
Scott Wolchok	82f7f8d471	[PyTorch] Adopt IValue::toTupleRef() where obvious (#65505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505 Generated with `fastmod -m 'toTuple(\s)->' 'toTupleRef()${1}.'` , followed by `fastmod '(std::move$.)toTupleRef\($.' '${1}toTuple()->'` to unbreak 2 callsites. ghstack-source-id: 142065835 Test Plan: CI Reviewed By: gchanan Differential Revision: D31131025 fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34	2021-11-02 10:22:18 -07:00
Slava Kovalevskyi	eb1b8a2160	pytorch_android_gradle_custom_build_single migrated from Circle to GHA. (#67577 ) Summary: in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes: * pytorch_android_gradle_custom_build_single removed from the circle (however template is still there since it is used by another similar workflow: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit, which will be migrated next) * new GHA workflow added: pytorch_android_gradle_custom_build_single Pull Request resolved: https://github.com/pytorch/pytorch/pull/67577 Reviewed By: malfet, mruberry Differential Revision: D32087709 Pulled By: b0noI fbshipit-source-id: f9581558ddc1453b63264bf19fe5a4c245b7c007	2021-11-02 10:21:03 -07:00
Scott Wolchok	d9bac7c316	[PyTorch] Add IValue::toTupleRef() (#65504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65504 We should be able to borrow a Tuple from an IValue without incurring refcount bumps. ghstack-source-id: 142065833 Test Plan: Added test coverage. Profiled static runtime on the local_ro net for ctr_mobile_feed. Inclusive time spent in VarTupleUnpack decreased about 0.3%, which roughly matches with the 0.36% of runtime that was previously spent in IValue::toTuple(). Reviewed By: hlu1 Differential Revision: D31130570 fbshipit-source-id: afa14f46445539e449068fd908d547b8da7f402c	2021-11-02 10:16:25 -07:00
Scott Wolchok	7cd62621fb	[PyTorch] Adopt faster Tuple::create (#65381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381 The previous diff adds a way to make Tuples of size 3 or less more efficiently. This diff makes it easier to hit that path and updates a bunch of callsites to hit it. ghstack-source-id: 142065832 Test Plan: CI Reviewed By: ezyang Differential Revision: D31069538 fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628	2021-11-02 10:10:31 -07:00
Howard Huang	9e71ea292d	Fix test_init_pg_and_rpc_with_same_socket by retrying on addr in use error (#67638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67638 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32074698 Pulled By: H-Huang fbshipit-source-id: 6b980fcdac4b0f1edfe086d0deb99be371a73900	2021-11-02 09:42:47 -07:00
Digant Desai	4061239fdd	[qnnpack] Remove redundant fp16 dependency (#67281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67281 `qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`. Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`. This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake. This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`. Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a. Reviewed By: kimishpatel Differential Revision: D31861962 fbshipit-source-id: e1425c7dc3e6700cbe3e46b64898187792555bb7	2021-11-02 09:29:55 -07:00
ankitaS11	cd51d2a3ec	Adding OpInfo for `logical_or`, `logical_and`, `logical_xor` (#67178 ) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/54261. This adds OpInfos for binary logical element wise operators. This is my first PR in OpInfos to PyTorch, looking forward to suggestions and any feedback. cc: mruberry krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/67178 Reviewed By: jbschlosser Differential Revision: D32057889 Pulled By: mruberry fbshipit-source-id: 7e670260af6b478dba9d6e8d77de4df1b6d0b5d1	2021-11-01 20:27:45 -07:00
Shunting Zhang	c65f332da4	torch::deploy unity and its demo (#67134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67134 This diff demos torch::deploy unity which builds the model, the dependencies and the runtime as a unity! The end user only need to use the build_unity rule to replace the python_binary rule to define the python application. Under the hood, we build the python application (an xar file), build the torch deploy runtime, and then embed the python application (the xar file) into the torch deploy runtime. When starting the torch::deploy runtime, the xar will be written to the filesystem and extracted. We put the extracted path to python sys.path so all the model files and all the python dependencies can be found! As a demo, the model here is just a simple python program using numpy and scipy. But theoretically, it can be as complex as we want. I'll check how bento_kernel works. Maybe we can learn from bento_kernel to simplify things a bit. ghstack-source-id: 142085742 Test Plan: ``` #build buck build mode/opt unity:unity # make sure the path exists before we start torch::deploy runtime # Otherwise the dynamic loader will just skip this non-existing path # even though we create it after the runtime starts. mkdir -p /tmp/torch_deploy_python_app/python_app_root #run LD_LIBRARY_PATH=/tmp/torch_deploy_python_app/python_app_root ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity ``` Reviewed By: suo Differential Revision: D31816526 fbshipit-source-id: 8eba97952aad10dcf1c86779fb3f7e500773d7ee	2021-11-01 19:32:49 -07:00
Sicheng Stephen Jia	ec6b472e0a	[vulkan] Add prepacking for conv2d_transpose (#67358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67358 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D31970903 Pulled By: SS-JIA fbshipit-source-id: 128deb40dc14fb97aa61af9cddab4540b630359e	2021-11-01 17:59:32 -07:00
francescocastelli	152f665dee	Inserted check for PyObject_IsInstance in THPVariableCheck (#67588 ) Summary: Inserted check for the return of PyObject_IsInstance to capture the case in which it raises an exception and return -1. When this happen THPVariable_Check now throws a python_error to signal the exception. Fixes https://github.com/pytorch/pytorch/issues/65084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67588 Reviewed By: mruberry Differential Revision: D32064776 Pulled By: albanD fbshipit-source-id: 895c7682e0991ca257e27f9638a7462d83707320	2021-11-01 16:53:54 -07:00
Pearu Peterson	c4bf196334	Strided masked reduction: mean (2nd try) (#67088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #67088 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32070264 Pulled By: cpuhrsch fbshipit-source-id: 08a91550dd24fb0f51abf06591a0e26186c4f9f9	2021-11-01 16:12:07 -07:00
Jacob Szwejbka	53e6aca8b3	[Pytorch Edge] Make More Classes Selective (#67397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67397 Expand selectivity coverage to classes created outside of TORCH_LIBRARY. ghstack-source-id: 142076940 Test Plan: Model unit tests, manually run some models on prod apps. Reviewed By: dhruvbird, bdhirsh Differential Revision: D31978965 fbshipit-source-id: 708901b47a9838ac54c78788028d0e18c1e378c0	2021-11-01 15:12:30 -07:00
francescocastelli	45d5b3248b	Fixed C++ BatchNorm pretty_print() with optional momentum (#67335 ) Summary: Summary : Inserted a check for the momentum and print "None" in case is not defined. See https://github.com/pytorch/pytorch/issues/65143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67335 Test Plan: The code below now prints `torch::nn::BatchNorm2d(128, eps=1e-05, momentum=None, affine=true, track_running_stats=true)` without generating errors. ``` torch::nn::BatchNorm2d m(torch::nn::BatchNormOptions(128).momentum(c10::nullopt)); std::cerr << *m << "\n"; ``` Fixes https://github.com/pytorch/pytorch/issues/65143 Reviewed By: mruberry Differential Revision: D32067820 Pulled By: ngimel fbshipit-source-id: f40f9bbe090aa78e00f6c3a57deae393d946b88d	2021-11-01 14:45:33 -07:00
John Shen	234bd6dc56	[quantized] Add bilinear quantized grid_sample (#66879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879 This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e. f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function. ghstack-source-id: 141321116 Test Plan: test_quantization Reviewed By: kimishpatel Differential Revision: D31656893 fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f	2021-11-01 14:44:26 -07:00
Howard Huang	0cbfd466d2	Remove ProcessGroup from TensorPipeAgent initialization (#66708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31762735 Pulled By: H-Huang fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d	2021-11-01 14:15:27 -07:00
Max Ren	ba369ea053	check to ensure profiler_edge is only added when use_kineto is on (#67494 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494 Reviewed By: jbschlosser Differential Revision: D32031142 Pulled By: mcr229 fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef	2021-11-01 13:42:14 -07:00
Gary Miguel	76f57cd442	[CODEOWNERS] Remove @neginraoof (#67631 ) Summary: She no longer works on the ONNX exporter Pull Request resolved: https://github.com/pytorch/pytorch/pull/67631 Reviewed By: malfet Differential Revision: D32070435 Pulled By: msaroufim fbshipit-source-id: d741a15bd7a916745aa7f2f3d9bb1dc699553900	2021-11-01 13:26:38 -07:00
Ivan Kobzarev	e80cb08cc8	[jit][shape_prop] Fix jit registration of unpack_sizes ops for prepacked (#66737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66737 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31703587 Pulled By: IvanKobzarev fbshipit-source-id: ccebe5ffc4fa959e3fa63afab1058d94e9df9dd9	2021-11-01 12:43:10 -07:00
Jane Xu	251278d385	[skip ci] set more tests with owners for distributed and elastic (#67583 ) Summary: It turns out my lint doesn't work on CI all the time because of shell differences. I'm working on a new more comprehensive lint in https://github.com/pytorch/pytorch/pull/66826 and it'd be nice if these could be cleared first. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67583 Reviewed By: H-Huang, mruberry Differential Revision: D32045155 Pulled By: janeyx99 fbshipit-source-id: ecfe9f008310c28e3b731e246c2b2ed0106d03b1	2021-11-01 12:26:03 -07:00
Kurt Mohler	4d99bc839b	Remove TH/THC Storage functions for unused dtypes (#67480 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67480 Reviewed By: mruberry Differential Revision: D32023494 Pulled By: ngimel fbshipit-source-id: 8827e1d6e765fee7219b5ee9888a1a3e3c5fbe89	2021-11-01 11:45:20 -07:00
Richard Barnes	a122ba776a	Fix less_than_lowest warnings (#67422 ) Summary: Fixes useless comparison against zero warnings for Half.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/67422 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31951939 fbshipit-source-id: 3e9940adda2d57b4d9b122f3862706c673f9ef4b	2021-11-01 11:19:55 -07:00
Brian Hirsh	da29655797	Disable miopen test for convolution on mobile (#66564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66564 Mobile thinks that we are segfaulting in _convolution, and this is the most recent substantive change to this function. I think it's pretty unlikely to have caused the crash, but if we don't have any better ideas we should try this. ghstack-source-id: 141972758 Test Plan: ship it and see if it resolves the error report Reviewed By: kimishpatel Differential Revision: D31598633 fbshipit-source-id: c34f4b0b7b8529e21fd019c886ad8d68ffe286b0	2021-11-01 10:22:40 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
Mike Iovine	39ad7b670e	[SR] Native implementation for aten::squeeze (#67441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67441 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31992093 fbshipit-source-id: 88191c13d229ffeac4e5b17b78e25f51d3f7f23e	2021-11-01 08:22:57 -07:00
Jane Xu	00da7b9a3b	Set test owner for vmap (#67582 ) Summary: More leftover actions from https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67582 Reviewed By: zou3519 Differential Revision: D32045160 Pulled By: janeyx99 fbshipit-source-id: 92ae9a533285b05b44bd04bb6127061c6fddd689	2021-11-01 07:22:48 -07:00
Alban Desmaison	9cdd1d7e48	Docs module check (#67440 ) Summary: Add check to make sure we do not add new submodules without documenting them in an rst file. This is especially important because our doc coverage only runs for modules that are properly listed. temporarily removed "torch" from the list to make sure the failure in CI looks as expected. EDIT: fixed now This is what a CI failure looks like for the top level torch module as an example: ![image](https://user-images.githubusercontent.com/6359743/139264690-01af48b3-cb2f-4cfc-a50f-975fca0a8140.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67440 Reviewed By: jbschlosser Differential Revision: D32005310 Pulled By: albanD fbshipit-source-id: 05cb2abc2472ea4f71f7dc5c55d021db32146928	2021-11-01 06:24:27 -07:00
Mike Iovine	0d7cf825fc	[SR] Drop support for aten::__is__ and aten::__isnot__ (#67550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67550 `aten::__is__` and `aten::__isnot__` are extremely problematic for a large number of SR graph optimizations. Some examples: - Removing ops that are no-ops in the forward pass like `aten::detach`. This would normally be trivial, but `is` introduces corner cases like this: ``` def forward(x): y = x.detach() return x is y ``` We get `False` before optimizations. But after optimizations, the test becomes `x is x`, and we get `True`. - `ReplaceWithCopy`: the pass that replaces ops like `aten::to` with an out variant that copies its input. The following graph returns `True` before optimizations, but `False` afterwards ``` def forward(x): y = x.to(x.dtype) return x is y ``` - And many more, `FuseListUnpack` can break too Since the ops are not used by 99.99% of users, rejecting them so we don't have to think about this is not a big deal. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D32022584 fbshipit-source-id: d135938edb2299c9b8f9511afac2bf568578879e	2021-11-01 04:45:14 -07:00
Ivan Kobzarev	7fbcf79684	[tensorexpr][nnc] Support quantization (#66676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31676329 Pulled By: IvanKobzarev fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22	2021-10-31 22:49:30 -07:00
Mike Ruberry	97f29bda59	Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593 ) Summary: This test is narrowly failing intermittently. See https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/7736//console for an example. Relevant snippet: ``` 12:28:43 ====================================================================== 12:28:43 FAIL [0.104s]: test_noncontiguous_samples_matmul_cuda_float32 (__main__.TestCommonCUDA) 12:28:43 ---------------------------------------------------------------------- 12:28:43 Traceback (most recent call last): 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test 12:28:43 result = test(self, param_kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper 12:28:43 return test(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 920, in only_fn 12:28:43 return fn(self, args, *kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1041, in wrapper 12:28:43 fn(args, **kwargs) 12:28:43 File "test_ops.py", line 262, in test_noncontiguous_samples 12:28:43 self.assertEqual(actual_grad, expected_grad) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual 12:28:43 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) 12:28:43 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 1 element(s) (out of 10) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.2278556823730469e-05 (-1.458460807800293 vs. -1.4584730863571167), which occurred at index 7. ``` Setting an absolute tolerance of 1e-4, which is what this PR does, should make the test pass consistently. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67593 Reviewed By: ngimel Differential Revision: D32050986 Pulled By: mruberry fbshipit-source-id: f15bc8c4516be0a859afcfa76d52334c0b2c58a5	2021-10-31 04:26:31 -07:00
Natalia Gimelshein	d0662f2f76	Add adaptive_max_pool OpInfo (#67405 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67405 Reviewed By: mruberry Differential Revision: D32044712 Pulled By: ngimel fbshipit-source-id: 4619d134d18359601801c029dd5be3f59b91626d	2021-10-30 21:19:58 -07:00
Eddie Yan	e01279cc2e	Disable reduced precision reductions for fp16 GEMMs (#67578 ) Summary: It appears that most NVIDIA architectures (well, at least there haven't been many reports of this issue) don't do reduced precision reductions (e.g., reducing in fp16 given fp16 inputs), but this change attempts to ensure that a reduced precision reduction is never done. The included test case currently fails on Volta but passes on Pascal and Ampere; setting this flag causes the test to pass on all three. CC stas00 ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67578 Reviewed By: mruberry Differential Revision: D32046030 Pulled By: ngimel fbshipit-source-id: ac9aa8489ad6835f34bd0300c5d6f4ea76f333d1	2021-10-30 21:14:11 -07:00
kshitij12345	510e3026a9	[numpy] add torch.argwhere (#64257 ) Summary: Adds `torch.argwhere` as an alias to `torch.nonzero` Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`. From NumPy docs, > np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257 Reviewed By: qihqi Differential Revision: D32049884 Pulled By: saketh-are fbshipit-source-id: 016e49884698daa53b83e384435c3f8f6b5bf6bb	2021-10-30 15:26:11 -07:00
Kefei Lu	a95c94f075	[fx2trt] fix acc_tracer when run against module that contains ScriptModule submodules (#67567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67567 - Fix an issue to allow it to work against modules that contains ScriptModule submodules. - Fix a bug where `getattr(base_class, method_name)` could raise KeyError Test Plan: linter; CI; Reviewed By: 842974287 Differential Revision: D31956070 fbshipit-source-id: 1114937f380af437fd6d36cd811ef609d7faefe7	2021-10-30 15:13:45 -07:00
Saketh Are	b24c34426f	Add OpInfo for torch.unique and torch.unique_consecutive (#67529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67529 Reviewed By: pbelevich Differential Revision: D32045941 Pulled By: saketh-are fbshipit-source-id: fefea1ddabcd3c4b40e9374b991410626437cdb4	2021-10-30 08:33:41 -07:00
Mike Ruberry	aa16de517d	Revert D31984694: [pytorch][PR] make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions Test Plan: revert-hammer Differential Revision: D31984694 (`d4493b27ee`) Original commit changeset: 0035ecd13980 fbshipit-source-id: c85689007719c9e4a930b0a8a32d481a501d3c14	2021-10-30 03:51:18 -07:00
Brian Hirsh	4a2bbc619d	move functionalize fallback out of aten/core (#67564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67564 moves the functionalize fallback out of aten/core and into aten, which should fix the issue described at https://fb.workplace.com/groups/163556484490704/permalink/1029416141238063/. I'm still not clear on why this didn't fail anything in CI / sandcastle on the initial diff: D31942093 (`0032fa7725`) ghstack-source-id: 141959891 Test Plan: Locally, running `buck build mode/opt //sigrid/feed/prediction_replayer:fully_remote_replayer_main` Reviewed By: zou3519 Differential Revision: D32027585 fbshipit-source-id: 2d86c4a6b3a73b00ee0ccee2f89a54704ed83e00	2021-10-29 21:40:49 -07:00
kshitij12345	c00806beda	Add skipXLA and expectedFailureXLA decorator (#66857 ) Summary: Add skipXLA and expectedFailureXLA decorator and relevant test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66857 Reviewed By: ngimel Differential Revision: D32039856 Pulled By: mruberry fbshipit-source-id: 3c99d5e06c1c7684d1f798c11c783bd6ebea9899	2021-10-29 19:53:36 -07:00
Shirong Wu	69adbc8778	Fix splitter_base and add unit test for trt splitter (#67569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67569 Splitter_base has assumption that the first subgraph after split must be cpu subgraph if there exists cpu node. This is wrong, start subgraph should be determined by which subgraph has 0-dep node. Also add unit test for splitter. Reviewed By: yinghai Differential Revision: D32012549 fbshipit-source-id: e2639ccd7774b4295ca05c2ddbefff9726702b3f	2021-10-29 18:51:59 -07:00
Masaki Kozuki	d4493b27ee	make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions (#67161 ) Summary: Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros. Rel: https://github.com/pytorch/pytorch/issues/67073 cc xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67161 Reviewed By: jbschlosser Differential Revision: D31984694 Pulled By: ngimel fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106	2021-10-29 17:27:07 -07:00
Don Jang	ad89d994c9	[Static Runtime] Support recordio format input for benchmark (#67530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67530 Currently `ptvsc2_predictor_bench` only uses the first input of a given recordio file even when the record io file contains many inputs. This change extends `StaticRuntime::benchmark` to accept multiple input entries so that we can benchmark more extensibly and realistically using all the inputs in the recordio file. Test Plan: Tested `ptvsc2_predictor_bench` with / without this change executing the following command: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423_0.predictor.disagg.local --recordio_inputs=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423.local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=0 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Reviewed By: hlu1 Differential Revision: D31947382 fbshipit-source-id: 4188271613aad201f8cad5f566e0dfed26680968	2021-10-29 14:38:14 -07:00
Mike Iovine	2cac92f470	[SR] Migrate sigrid_transforms_torch_bind to new FuseListUnpack (#67163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67163 Migrated both the variadic and non-variadic versions. This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons: * You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native). * The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Accuracy Test Done at the top of this diff stack. Performance Everything seems to be about the same plus or minus some noise. * Baseline (D31947382 with some errors correct locally, the version of the op here is fused and variadic): P464964343 * This diff, fused_variadic: P464960645 * Variadic transformation disabled, fused (caught and fixed a schema error here): P464961561 * List unpack fusion disabled, variadic: P464962661 * Both variadic and fusion passes disabled: P464963342 The predictions match with the JIT interpreter for all ops. Reviewed By: hlu1 Differential Revision: D31887300 fbshipit-source-id: 25a7b4e35eed21ca8b2c98297513425cf17f461a	2021-10-29 14:25:10 -07:00
Shunting Zhang	289b0f7b04	Resent the reverted PR: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67303 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32016061 Pulled By: shunting314 fbshipit-source-id: 9460c90dd4f630f4c81dbfbbd772446ddffbabd0	2021-10-29 14:10:43 -07:00
Dmytro Ivchenko	ba74b03b0d	Back out "[sharded_tensor] simplify init_from_local_shards API" Summary: Original commit changeset: 6e97d95ffafd Test Plan: unit test Reviewed By: wanchaol Differential Revision: D32023341 fbshipit-source-id: 2a9f7b637c0ff18700bcc3e44466fffcff861698	2021-10-29 14:01:07 -07:00
Dmytro Mishchenko	5c77ccefe0	Resolves #67227 documentation issue (#67379 ) Summary: Changed "Chi2" in the docstring to a more intuitive "Chi-squared" Fixes https://github.com/pytorch/pytorch/issues/67227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67379 Reviewed By: jbschlosser Differential Revision: D32023761 Pulled By: ngimel fbshipit-source-id: b514b49726f616914871a9a831aa10e12e4be90b	2021-10-29 13:47:38 -07:00
Jacob Szwejbka	66202b7f8d	[Pytorch Edge] Expose runtime operators versioning (#67385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385 As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file. ghstack-source-id: 141782717 Test Plan: unit test Reviewed By: cccclai Differential Revision: D31976654 fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324	2021-10-29 13:42:59 -07:00
Zhengxu Chen	60a80c5bbd	[jit] Move ModuleIndex operator to selective build. (#67483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67483 Move ModuleIndex operator to selective build candidates. ghstack-source-id: 141953898 Test Plan: eyes Reviewed By: qihqi Differential Revision: D32003895 fbshipit-source-id: 635c2bc37cd30a98f4a1e182fd6534eb9f1c4a69	2021-10-29 13:31:35 -07:00
Zhengxu Chen	12ede84dbb	[jit][edge] Enable lite interpreter to correctly handle INTERFACE_CALL instruction. (#65972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65972 ghstack-source-id: 141842336 Test Plan: buck test mode/dev //caffe2/test:mobile -- --exact 'caffe2/test:mobile - test_stacktrace_interface_call (mobile.test_lite_script_module.TestLiteScriptModule)' Reviewed By: qihqi Differential Revision: D31326147 fbshipit-source-id: 338ff4ce8ddc9502ffe0add49057b33b52a24955	2021-10-29 13:13:32 -07:00
Zhengxu Chen	d6b15bfcbd	[jit][edge] Load interface methods to corresponding ClassTypes. (#65971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65971 ghstack-source-id: 141842335 We should be able to load methods into their ClassTypes. Right now mobile runtime only loads data member to ClassTypes but not for methods. To support interface call, we inject methods into ClassTypes when the methods are loaded. Test Plan: existing tests should all pass. Reviewed By: qihqi Differential Revision: D31326146 fbshipit-source-id: fb1dbea619910ef1f8fa26146da3ebab348fe902	2021-10-29 12:48:57 -07:00
Jane Xu	6259601c8a	Set test owners for tests with unknown owners (#67552 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67552 Reviewed By: jbschlosser Differential Revision: D32028248 Pulled By: janeyx99 fbshipit-source-id: a006f7026288b7126dba58b31cac28e10ce0fed6	2021-10-29 12:42:01 -07:00
Jane Xu	c19cda5782	[skip ci] Add test owners for a special hi-pri class of tests (#67553 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 This change does require some context: there were several suggestions regarding what to do about this group of tests: tests that are core and crucial to all of PyTorch and are too broad to be owned by one team. 1. Let's add a "module: core" and put people behind it! This idea sounds appealing unless you are one of the people backing the label. From talking to albanD among others, this idea of putting all these core tests on the shoulder of a few people or one team isn't super fair and I have not yet found anyone willing to take on this job. 2. Taking advantage of the fact that we already have a triaging oncall that takes turns triaging issues, we can leave these tests essentially unlabeled and allow the oncall to triage these tests. Since these tests are crucial to PyTorch, we'll add the "high priority" label to mark them different from other unowned tests (see https://github.com/pytorch/pytorch/issues/67552). 3. I _could_ still create an unbacked label "module: core" and attribute these tests there, but I don't like the idea of creating a facade that the tests are "triaged" to a label when no one is actually taking a look. Now we could potentially break these tests down into smaller files so that each piece _could_ be owned by a team, but 1. I don't know if this is currently feasible and 2. This approach does not prevent that from happening in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67553 Reviewed By: albanD Differential Revision: D32025004 Pulled By: janeyx99 fbshipit-source-id: 1fb1aa4c27e305695ab6e80ae3d02f90519939c0	2021-10-29 12:17:21 -07:00
albanD	fcba8018c2	Update codeowners for sphinx conf (#67548 ) Summary: Add a codeowner for the conf file to ensure allowlist modification is monitored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67548 Reviewed By: jbschlosser Differential Revision: D32023929 Pulled By: albanD fbshipit-source-id: 63f18cdd725cc60993a6c0a9e3529ed95845e0bb	2021-10-29 10:50:15 -07:00
Ivan Yashchuk	69f86ecd3a	Sparse CSR CUDA: add `torch.add` with all inputs sparse (#63948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation, the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. Fixes https://github.com/pytorch/pytorch/issues/59060 cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31909731 Pulled By: cpuhrsch fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca	2021-10-29 10:43:05 -07:00
Pritam Damania	285d5a55b9	Add API usage to torch.RPC (#67515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67515 Adding API usage to torch.rpc to better understand usage of this API. ghstack-source-id: 141877028 Reviewed By: rohan-varma Differential Revision: D32011465 fbshipit-source-id: 34d006ece307ae4a90fbcc6cb44fc0b7edca611e	2021-10-29 10:38:41 -07:00
Mike Ruberry	ddc9bd335b	Adds reference vs. noncontiguous OpInfo test (#67434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63341. This PR adds a new test, `test_noncontigous_samples`, that runs ops forward and backward and compares their outputs and grads between "normal" contiguous SampleInputs and noncontiguous SampleInputs. This test should preclude the need for noncontiguous SampleInputs going forward. The test was added by generalizing the `.numpy()` transform on SampleInputs to support a new `.noncontiguous()` transform and copying forward/backward patterns from other tests in test_ops.py. It also discovered that many SampleInputs were incorrectly reusing tensors, so those have been revised. SampleInputs creating noncontiguous tensors for testing have also been altered to no longer do so. In addition, this test discovered the following high priority silent correctness issues: - https://github.com/pytorch/pytorch/issues/67432 - https://github.com/pytorch/pytorch/issues/67517 - https://github.com/pytorch/pytorch/issues/67513 - https://github.com/pytorch/pytorch/issues/67512 - https://github.com/pytorch/pytorch/issues/67470 It also identified the following issues: - https://github.com/pytorch/pytorch/issues/67539 The pow OpInfo also incorrectly specified that pow supported the bool datatype, and this has been fixed. Its SampleInputs were written in a way that made requests for boolean SampleInputs return type promoting inputs that never actually tried to compute pow in bool. This PR suggests we should add the following guidance for writing SampleInputs: - ensure that all SampleInputs are independent of each other (don't reuse tensors) - ensure that all SampleInput tensors have no grad or backward functions (no autograd history) -- they should be leaves - prefer keeping sample inputs simple where possible, a good set of handwritten samples that test interesting cases may be better than an exhaustive but hard to read and maintain programmatic enumeration - keep code readable by using functools.partial and writing simple inline helpers; break up large statements into a more readable series of smaller statements; especially don't write complicated generator expressions with a `for` at the end! fyi kshitij12345 krshrimali pmeier anjali411 saketh-are zou3519 dagitses Pull Request resolved: https://github.com/pytorch/pytorch/pull/67434 Reviewed By: ngimel Differential Revision: D32014557 Pulled By: mruberry fbshipit-source-id: b17e19adc1d41e24441f0765af13d381fef5e3c1	2021-10-29 09:55:56 -07:00
Joel Schlosser	16d937b0df	Fix strided _conv_double_backward() with 3D input / weight (#67283 ) Summary: Removes the 3D special case logic in `_convolution_double_backward()` that never worked. The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`. The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283 Reviewed By: anjali411 Differential Revision: D32021100 Pulled By: jbschlosser fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d	2021-10-29 09:48:53 -07:00
Erjia Guan	bf31995194	Add OpInfo for `nn.functional.cosine_embedding_loss` (#67465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67465 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32001920 Pulled By: ejguan fbshipit-source-id: 82e547b5f0057b4ecc61e6f3be56bf038db179d1	2021-10-29 09:11:23 -07:00
Erjia Guan	bcd301a457	Add OpInfor for `nn.functional.ctc_loss` (#67464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67464 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32001919 Pulled By: ejguan fbshipit-source-id: f277a8e9c9887ed62e871e8a0c8549e853e34356	2021-10-29 09:11:21 -07:00
Erjia Guan	e2e20e79fb	Add OpInfo for `nn.functional.poisson_nll_loss` (#67371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67371 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31973173 Pulled By: ejguan fbshipit-source-id: 3cbb21d292b95039f7c7d1f4caa300f3d619740a	2021-10-29 09:11:18 -07:00
Erjia Guan	8b8fb4f4e6	Add OpInfo for `nn.functional.gaussian_nll_loss` (#67376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67376 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31974040 Pulled By: ejguan fbshipit-source-id: d6abac78a378d2763ca2fd465e64dea9985840f2	2021-10-29 09:11:16 -07:00
Erjia Guan	1d900ee22f	Add OpInfo for `nn.functional.hinge_embedding_loss` (#67381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67381 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31976354 Pulled By: ejguan fbshipit-source-id: 09068bb3d1bba665517254dd8a2dab9abd78b0e2	2021-10-29 09:11:14 -07:00
Erjia Guan	c6a6c09383	Add OpInfo for `torch.nn.functional.gaussian_nll_loss` (#67356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67356 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31970077 Pulled By: ejguan fbshipit-source-id: 91bd9c5202b49f79ef83795196c2773fbe8a9afd	2021-10-29 09:09:48 -07:00
Sean Silva	2e156f649e	Sort output of *NativeFunctions.h (#67046 ) Summary: This ensures deterministic output, allowing systems like ccache to be more effective. cc ezyang bhosmer bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/67046 Reviewed By: VitalyFedyunin Differential Revision: D31896114 Pulled By: bdhirsh fbshipit-source-id: d29ef0cf6c7e3408b104c5239b620eaa24327088	2021-10-29 09:03:39 -07:00
Samantha Andow	f95ed474ac	Norms Op Info (#67442 ) Summary: Adds op infos for group_norm, instance_norm, and local_response_norm Pull Request resolved: https://github.com/pytorch/pytorch/pull/67442 Reviewed By: mruberry Differential Revision: D31992225 Pulled By: samdow fbshipit-source-id: 5bf3e21cff2a39ca3e47dbe13db7671c617aaad1	2021-10-29 08:36:07 -07:00
Vasiliy Kuznetsov	d58f209326	add dequantize support for fp16 + cuda (#67234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67234 Extends the dequantize fp16 function to also work on CUDA, and adds a test. Test Plan: ``` python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cuda python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cpu ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31915330 fbshipit-source-id: 622d47464fae26bf02f295ff56df63a3bf80b786	2021-10-29 07:58:38 -07:00
Vasiliy Kuznetsov	99282126dc	pytorch quantization: document the custom module APIs (#67449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67449 Adds a description of what the current custom module API does and API examples for Eager mode and FX graph mode to the main PyTorch quantization documentation page. Test Plan: ``` cd docs make html python -m http.server // check the docs page, it renders correctly ``` Reviewed By: jbschlosser Differential Revision: D31994641 Pulled By: vkuzo fbshipit-source-id: d35a62947dd06e71276eb6a0e37950d3cc5abfc1	2021-10-29 05:22:17 -07:00
Jerry Zhang	acdc754918	[quant][graphmode][fx] Add support for ObservationType.OUTPUT_SHARE_OBSERVE_WITH_INPUT in backend_config_dict (#67210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67210 `OUTPUT_SHARE_OBSERVE_WITH_INPUT` is an observation type for operators that would have the same observer/fake_quant instance as output, when quantized, these ops can take quantized Tensor as input and output a quantized Tensor with the same quantization parameters (scale/zero_point etc.) as input Using cat as an example in this PR. Other ops can be added later gradually (together with tests). Test Plan: python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D31907243 fbshipit-source-id: 2c7af4a456deb5e6597b0b9cd4e32c5fcdec580b	2021-10-29 02:37:48 -07:00
Jerry Zhang	2bb20c0e48	[quant] Move test file to fx2trt folder (#67129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67129 Since the tests depends on experimental feature (fx2trt), we'll move them to fx2trt foler Test Plan: python test/fx2trt/test_quantize_fx.py Imported from OSS Reviewed By: vkuzo Differential Revision: D31877123 fbshipit-source-id: 5a98a257c4806c1911cfc2616d5ad98d715234c4	2021-10-28 23:58:44 -07:00
Yinghai Lu	5e46a4f6bd	Fixes to make trt timing_cache really work (#67524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67524 We have some loose ends to tie to make timing cache really work. This diff fixes them. Reviewed By: wushirong Differential Revision: D32012021 fbshipit-source-id: 1e93c76d48a3740a02613e1f19222ed92cca9deb	2021-10-28 23:09:24 -07:00
Michael Suo	96c868217c	[deploy] fix TypedStorage serialization (#67499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67499 Since https://github.com/pytorch/pytorch/pull/62030 was landed, storages being produced when loading from a pickle are of type TypedStorage. We weren't catching this in our deploy serialization, leading tensors to actually get pickled instead of the storages getting shared across interpreters. Since this is technically correct still, it wasn't caught by any of our tests, until someone tried to pass a really big tensor and started ooming. ghstack-source-id: 141869521 Test Plan: added unit test Reviewed By: shunting314 Differential Revision: D32004075 fbshipit-source-id: ef5a80cd3cb1dff0b6b4c1b6c95923e4faab7d50	2021-10-28 22:33:04 -07:00
Rob Kinyon	4052393af8	Revert D31450501: Wextra caffe2/ Test Plan: revert-hammer Differential Revision: D31450501 (`7c2d3e6d32`) Original commit changeset: 702728fdb3c5 fbshipit-source-id: 486b8e872c38415706288f7f389d7cb1ea5eb0a9	2021-10-28 20:43:28 -07:00
Vidhoon Viswanathan	18807273cb	Fix Ads build broken due to comparison type mismatch (#67526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67526 Build error P465285570 due to D31942093 (`0032fa7725`) (Note: this ignores all push blocking failures!) Test Plan: build passes after fix Reviewed By: jbschlosser Differential Revision: D32013247 fbshipit-source-id: b60a65afd7a5a2d3249150fbc2ac52374d62a591	2021-10-28 20:42:13 -07:00
Jay Zhang	26241994b2	Remove the argument strip_doc_string of export() method entirely. (#66615 ) (#67278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67278 Remove the argument strip_doc_string of export() method entirely. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962512 Pulled By: malfet fbshipit-source-id: 168ad3f157a80d1edd7a9053783b3f3deb2ecf43 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:07 -07:00
Jay Zhang	43d51254bf	Deprecate the argument _retain_param_name of export() method entirely. (#66617 ) (#67277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67277 Remove the argument _retain_param_name of export() method entirely. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962514 Pulled By: malfet fbshipit-source-id: 8ac5e3a4a7821cc580951a7f167fd20069116350 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:05 -07:00
Jay Zhang	40920185ac	[ONNX] Remove the argument enable_onnx_checker of export() method entirely. (#66611 ) (#67276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67276 [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962520 Pulled By: malfet fbshipit-source-id: 86ee15f525261c0da74175e47dd74eeb169ac47f Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:03 -07:00
Bowen Bao	609da98154	[ONNX] Update value name copying logic for onnx (#66170 ) (#67275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67275 Specifically targets the symbolic functions that directly returns input as output. The old logic will override the value name with output value name. But since the input is unmodified and unchanged, it is more logically to keep its original input name. Especially for cases where inputs are directly from model parameters. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962517 Pulled By: malfet fbshipit-source-id: 9cb4a2bb55aa08dd1ce8fdec24e7cfb11d7ea97c Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-28 19:23:55 -07:00
Richard Barnes	7c2d3e6d32	Wextra caffe2/ (#67319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67319 Test Plan: Sandcastle Reviewed By: pbelevich Differential Revision: D31450501 fbshipit-source-id: 702728fdb3c5b00510ec637ff65bb2c6949fcc4e	2021-10-28 19:02:34 -07:00
Yinghai Lu	d8bde98f36	Workaround the bug of TRT which creates extra outputs (#67327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67327 At cerntain condition, TRT will create extra outputs, which seems more like a bug. If we don't capture those hidden outputs, we won't allocate memory to host those outputs and trt will end up writing to illegal memory. This diff address the issue but capturing the hidden outputs and allocate proper memory for them. Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31955379 fbshipit-source-id: c9faaf91ed45bec8e0bc4e0bea812a0a3f2abad0	2021-10-28 18:43:59 -07:00
Elias Ellison	fc82ad186a	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797466 Pulled By: eellison fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd	2021-10-28 17:09:03 -07:00
John Clow	2661507488	Adding support for Symbolic Shapes in Inplace Ops #65642 (#65729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65729 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31961857 Pulled By: Gamrix fbshipit-source-id: bfb1e8a66be254638e8e93ade091ab9df6029e8c	2021-10-28 16:49:10 -07:00
Eli Uriegas	d0bc01fac2	ci: Migrate hardcoded docker builds to GHA (#67455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67455 Migrates docker builds that don't have dependent jobs within the pytorch repository to our new GHA docker build job Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D31997671 Pulled By: seemethere fbshipit-source-id: 9d6f58fa8ea8731cf12457fe64dc65e70f3d9f25	2021-10-28 14:50:05 -07:00
Yiwen Song	6696c59af4	Adding `optimizer` attribute to SequentialLR (#67406 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67318 :) cc albanD, datumbox Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406 Reviewed By: jbschlosser Differential Revision: D31997873 Pulled By: albanD fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6	2021-10-28 14:43:40 -07:00
Mike Iovine	354363b57a	[SR] Native implementation for aten::size (#67346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67346 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31965159 fbshipit-source-id: 86a69c395f401c4a4c55daa4c5fe80764383c8e5	2021-10-28 14:18:17 -07:00
Scott Wolchok	9f01937caf	[PyTorch][easy] Deduplicate memory planner creation code (#67265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67265 Avoid repeating this initialization code. ghstack-source-id: 141585971 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31933368 fbshipit-source-id: 6342ae9bb82c4d152a427bad142470c3d162de69	2021-10-28 14:13:43 -07:00
Nicolas Hug	82c356505f	Revert D31894777: [pytorch][PR] Replace issue templates with new issue forms Test Plan: revert-hammer Differential Revision: D31894777 (`62feadd76f`) Original commit changeset: fbd39f7ed4ca fbshipit-source-id: 4698ff5fe8629f9ad0249425a369c6f0bd89c891	2021-10-28 13:52:43 -07:00
Mike Iovine	afb8434440	[SR] Native implementation for aten::view (#67341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67341 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like `TupleUnpack`). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31962589 fbshipit-source-id: 3107fb169c1b02fb2bafbb355c005669b5fa8435	2021-10-28 13:37:46 -07:00
Zhengxu Chen	60472594e1	[jit][edge] Implement torch::jit::Function for mobile funciton. (#65970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65970 ghstack-source-id: 141842338 mobile::Function should inherit from jit::Function, because for interface call support, we need an abstract jit::Function type stored in corresponding ClassTypes, so that we can look up methods in there. Previously mobile::Function is implemented separately which prevents this. Since we get rid of all the unneeded virtual methods from jit::Function, we can inherit from torch::jit::Function without too much cost. NOTE that torch::jit::Function is already in dependency because we need it to support custom class call. We should be able to use Function uniformly without looking into whether it's a builtin function or mobile::Function. Test Plan: no behavior change. Reviewed By: iseeyuan, mrshenli Differential Revision: D31326148 fbshipit-source-id: 36caeaf3c8c5f54c23a1a7c8c9e2fd6e78b19622	2021-10-28 13:33:30 -07:00
Zhengxu Chen	5ef62c88a9	[jit] Replace get_executor() with call() in abstract Function interface. (#65969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969 ghstack-source-id: 141759210 Test Plan: no behavior change. Reviewed By: anjali411 Differential Revision: D31326151 fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4	2021-10-28 13:11:29 -07:00
Mike Iovine	8363da3f92	[SR][C2][easy] Benchmarks report # of ops (#67436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67436 This information is useful for comparing static runtime to c2 Reviewed By: d1jang Differential Revision: D31991571 fbshipit-source-id: eb83bc4564b05d56fb9a550863eea3f6312f3f6c	2021-10-28 13:03:09 -07:00
Douglas Lehr	b8f07689f2	[ROCm] Enable frexp support for ROCm builds (#67226 ) Summary: The frexp function has been enabled in ROCm code. Updating PyTorch to enable this functionality. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67226 Reviewed By: jbschlosser Differential Revision: D31984606 Pulled By: ngimel fbshipit-source-id: b58eb7f226f6eb3e17d8b1e2517a4ea7297dc1d5	2021-10-28 12:42:09 -07:00
Zhengxu Chen	0795735351	[jit] Clean up unneeded virtual methods from Function interface. (#65968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968 tryToGraphFunction() should cover all cases and more composable than adhoc virtual methods. ghstack-source-id: 141759214 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326154 fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807	2021-10-28 12:28:48 -07:00
Ivan Yashchuk	bd5e6fe5ac	Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453 ) Summary: Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183. I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true. cc nikitaved pearu cpuhrsch IvanYashchuk mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D31997574 Pulled By: cpuhrsch fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744	2021-10-28 12:24:46 -07:00
Jiewen Tan	5b8b2382d1	Mark mv as CompositeExplicitAutograd (#67373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67373 From the implementation of mv, it's decomposed into addmv. So it should be a CompositeExplicitAutograd op. Test Plan: It shouldn't change any behaviors. So, CI. Reviewed By: bdhirsh Differential Revision: D31973265 Pulled By: alanwaketan fbshipit-source-id: 3b6850f08e6f671b908a177f148cc6194baa8a13	2021-10-28 11:59:00 -07:00
Can Balioglu	f3aae62942	Port `tril` and `triu` to structured kernels (#67055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67055 This PR ports `tril` and `triu` operations to structured kernels. ghstack-source-id: 141797608 Test Plan: Extended the existing unit tests. Reviewed By: wanchaol Differential Revision: D31844638 fbshipit-source-id: 03ea4aa2410b042cafc3c5397e777a9ca5351b39	2021-10-28 11:45:58 -07:00
John Shen	4a1f73ccb3	[qnnpack] Remove asymmetrical padding parameters in qnnpack (#67102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67102 Getting rid of top/bottom and left/right distinction, replacing with height and width. These parameters are widely used in qnnpack and always passed together but never different. Pytorch doesn't support asymmetrical paddings either so I see no potential use for this. ghstack-source-id: 141334544 Test Plan: qnnpack unit tests Reviewed By: kimishpatel Differential Revision: D31863370 fbshipit-source-id: aa57490399e23d6139b2ad7b745139752acd7848	2021-10-28 11:40:19 -07:00
Howard Huang	4e873d6799	Formatting changes (#66257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66257 Used `clang-format -i` for these two files. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31762737 Pulled By: H-Huang fbshipit-source-id: e94e301d0b013dbb8f2cef19ff140bac5811738f	2021-10-28 11:36:00 -07:00
Robert Blackwell	cee4e8f35d	Add FlexiBLAS build support per #64752 (#64815 ) Summary: To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS Fixes https://github.com/pytorch/pytorch/issues/64752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815 Reviewed By: jbschlosser Differential Revision: D31997745 Pulled By: albanD fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa	2021-10-28 11:28:00 -07:00
Shirong Wu	55b7387e45	Timing cache for Tensort (#67214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67214 This is draft for creating timing cache for tensorrt. Reviewed By: yinghai, 842974287 Differential Revision: D31783757 fbshipit-source-id: 211ab68df0832120fa637304e4a7ece80d26f9b1	2021-10-28 11:21:51 -07:00
Brian Hirsh	0032fa7725	Add a Functionalization pass in core (#64432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432 Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048 I've addressed all of the feedback in the original PR and made some pretty large changes, listed below. Table of Contents - Starting points - List of the main changes from the original PR - Next Steps - Example codegen output (for a view, mutation, and view+mutation op) Starting Points A good place to start when looking through the PR: * Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass. * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement. * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))` * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic. * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen. * documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large) * documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12 * Reading through the codegen output at the bottom of this description. Main changes from the original PR (1) I use lambdas instead of a giant enum to handle all of the different views. This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`) (2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`. This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now. (3) `FunctionalTensorWrapper` objects accurately report stride information. It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual wrapper tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping. To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it. (4) `FunctionalTensorWrapper` objects accurately report aliasing information. There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage). One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set. Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)? (5) better docs :) View operator coverage (6) The functionalization pass now gets math-composite view ops for free. I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation. There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets. (7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}). From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation (8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`. These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op. The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()). I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`). I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing. Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though: * the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators). * For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites): * select * slice * diagonal * as_stridied * split * split_with_sizes A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though. Current State + Next Steps There are a bunch of followups after this PR eventually lands. Roughly in order: * Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it). * Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys * Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway. * Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage. Example Codegen Output View Op: ``` ::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) { auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); ::std::vector<at::Tensor> out; { at::AutoDispatchBelowFunctionalize guard; auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim); out = at::functionalization::impl::wrapFunctionalTensor(tmp_output); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.split(split_size, dim)[mutated_view_idx]; }, [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim); } ); at::functionalization::impl::set_view_meta(out, self, view_meta); at::AutoDispatchDirectlyToNative native_guard; ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim); at::functionalization::impl::set_strides(out, reference_tensor_output); return out; } ``` Mutation Op: ``` at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { at::functionalization::impl::sync(self); at::functionalization::impl::sync(other); auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other); at::Tensor tmp_output; { at::AutoDispatchBelowFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch tmp_output = at::redispatch::add( ks & c10::after_func_keyset, self_, other_, alpha); } self.replace_(tmp_output); at::functionalization::impl::maybe_add_update(self); return self; } ``` View + Mutation Op: ``` at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) { at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.transpose(dim0, dim1); }, [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1); } ); at::functionalization::impl::mutate_view_meta(self, view_meta); // See Note [Propagating strides in the functionalization pass] // Directly update the sizes/strides/storage_offset fields on self using the inplace call. // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels. // Its only job is to directly compute the output size/stride/storage_offset metadata. at::AutoDispatchDirectlyToNative native_guard; at::native::transpose_(self, dim0, dim1); return self; } ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942093 Pulled By: bdhirsh fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf	2021-10-28 10:51:17 -07:00
Brian Hirsh	b0a8ca2cb5	add tags for inplace view ops in native_functions.yaml (#65412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65412 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942094 Pulled By: bdhirsh fbshipit-source-id: 1f7f6ea7df13e9f91b81ed64088e35e471800aa8	2021-10-28 10:51:15 -07:00
Brian Hirsh	03f3a0331b	add slice/select/diagonal_scatter variants as primitive ops (#64430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64430 The functionalization pass needs `{view}_scatter` versions of the slice/select/diagonal ops in order to correctly propagate mutations from a view to its base. On top of that, the implementations need to be primitive w.r.t. autograd, because they look something like `...slice().copy_()`, and the functionalization pass can't use views + mutations inside of it's own alias-removal machinery! I added some basic tests that I tried to base off of existing tests for views (particularly around testing the derivative formulas), but I'm wondering if I should add something more comprehensive. Also, as_strided fits into this category - the functionalization pass will need an `as_strided_scatter` op that's primitive w.r.t. autograd. I didn't add it for now, because it'll involve duplicating a bunch of logic from the current `as_strided_backward()` function, and also writing a derivative formula that I wasn't sure how to write :) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942092 Pulled By: bdhirsh fbshipit-source-id: c702a57c2748a7c771c14e4bcc3e996b48fcc4c8	2021-10-28 10:51:12 -07:00
Brian Hirsh	665c148e42	move some codegen utilities into utils.py (#63094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094 This PR: - Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports - Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it) The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942096 Pulled By: bdhirsh fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e	2021-10-28 10:49:17 -07:00
Mike Iovine	b100a9ea82	Back out "Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t>" (#67456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67456 There are some compatibility issues, we need to back-out before it gets to prod feed models Test Plan: CI Reviewed By: pgarbacki Differential Revision: D31997684 fbshipit-source-id: 8b2584cb5d43e487719c6530d4178988fd03c455	2021-10-28 10:44:41 -07:00
Jerry Zhang	a8f85300da	[quant][graphmode][fx][test] Refactor test code for quant-fx2trt unit tests (#67070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67070 Test Plan: python test/test_quantization.py TestQuantizeFxTRTOps Imported from OSS Reviewed By: vkuzo Differential Revision: D31850124 fbshipit-source-id: a314b8869c091743dad7e5a1468985cf8aff6091	2021-10-28 10:39:58 -07:00
Yanli Zhao	325b15039c	Add FSDP tests to verify forward overlap and memory usage (#67117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67117 Add FSDP tests to verify forward overlap and memory usage ghstack-source-id: 141783871 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D31845629 fbshipit-source-id: b8b747e036925a9bb9164f0a5546000eece8442a	2021-10-28 10:29:27 -07:00
Howard Huang	938afa37a3	Remove process group barrier and all_reduce function calls from tensorpipe agent (#65946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65946 Add new function in agent_utils to perform a synchronization of active call counts using store. This is intended to replace the barrier and all_reduce used by the process group in RPC shutdown. `test_ddp_comparison` and `test_ddp_comparison_uneven_inputs` test fail with these changes. It seems like the RPC agents are not accessing the same store, so the total count of processes never reaches the world size to exit the barrier, still ened to investigate why it is like this only for these test cases. Setting clean_shutdown to false ignores this code path which allows the test to pass. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31762736 Pulled By: H-Huang fbshipit-source-id: cb5d0efe196f72726c63393c4293e97ec4f18548	2021-10-28 10:15:56 -07:00
Nikita Shulga	0c93c8e39a	Disable linux-xenial-cuda10.2 config (#67344 ) Summary: linux-xenial-cuda10.2 and linux-bionic-cuda10.2 are very similar, no need to run both configs Moved all auxiliary builds from xenial to bionic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67344 Reviewed By: seemethere, janeyx99 Differential Revision: D31964850 Pulled By: malfet fbshipit-source-id: d07ce266c843c7fd69b281e678c4247b0bf6da20	2021-10-28 10:10:13 -07:00
Kenichi Maehashi	6ed68f3f84	Document `torch.jit.is_tracing()` (#67326 ) Summary: This PR adds `torch.jit.is_tracing()` to the JIT API reference. This function is widely used but left undocumented: https://github.com/search?q=torch.jit.is_tracing&type=code Pull Request resolved: https://github.com/pytorch/pytorch/pull/67326 Reviewed By: tugsbayasgalan Differential Revision: D31985251 Pulled By: Krovatkin fbshipit-source-id: 852b432b08d63df8bd7a7a02c9555e61f5f37978	2021-10-28 09:56:09 -07:00
albanD	b27b1ff809	Fix deadlock when forward and backward AD are used at the same time (#67360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67360 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31973040 Pulled By: albanD fbshipit-source-id: f9c75c6497b622c86e8653027bce45461304eff5	2021-10-28 09:11:36 -07:00
albanD	d3f03af496	Fix indentation in forward_grad.h (#67359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67359 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31973039 Pulled By: albanD fbshipit-source-id: 80ca7870ea35977560334aa65aa344da6847c039	2021-10-28 09:10:18 -07:00
Bin Wen	6900aacf54	[fbcode] Fix operator_benchmark with jit mode (#67382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67382 two simple updates: * fix running benchmark with --use_jit. Previously will fail with error torch.jit.frontend.UnsupportedNodeError: import statements aren't supported: File "/proc/self/fd/3/bmm_test.py", line 9 def __invoke_main(): import ctypes ~~~~~~ <--- HERE import ctypes.util import errno * add matmul to bmm benchmark as D31837588 Test Plan: buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:bmm_test -- --forward_only=True --mkl_num_threads=1 --omp_num_threads=1 --use_jit=True Reviewed By: ShijunK Differential Revision: D31960528 fbshipit-source-id: 84b892934149784d1b8a0f90b0233cc2f1cf1f5f	2021-10-28 08:48:10 -07:00
Jane Xu	eb8b80b76f	Add test owners for elastic tests (#67293 ) Summary: Action following discussion with distributed and r2p team--the tests under elastic in distributed should be owned by oncall: r2p and not distributed. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67293 Reviewed By: jbschlosser Differential Revision: D31973779 Pulled By: janeyx99 fbshipit-source-id: 05875a7600c6eb1da1310a48e1e32a1a69461c55	2021-10-28 08:32:50 -07:00
Bin Bao	2366948085	[LT] Add ir_util for ComputePostOrder (#67282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab, ngimel Differential Revision: D31961754 Pulled By: desertfire fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373	2021-10-28 08:17:52 -07:00
albanD	6293e0ad61	update coverage ignore to not skip whole modules (#67395 ) Summary: This reduces the chance of a newly added functions to be ignored by mistake. The only test that this impacts is the coverage test that runs as part of the python doc build. So if that one works, it means that the update to the list here is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67395 Reviewed By: jbschlosser Differential Revision: D31991936 Pulled By: albanD fbshipit-source-id: 5b4ce7764336720827501641311cc36f52d2e516	2021-10-28 08:07:24 -07:00
Shubham Bhokare	961fd76a9a	[ONNX] Relax check on Prim::PythonOp nodes for ONNX_FALLTHROUGH (#66172 ) (#67273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67273 * Relax check on Prim::PythonOp nodes for Onnx_fallthrough * Add tests Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962521 Pulled By: malfet fbshipit-source-id: 878920196d66c4f1dadaf3ebb9a7bf69b88849b4	2021-10-28 08:02:49 -07:00
Bowen Bao	02a78bdba7	[ONNX] Support conv-bn fusion in blocks (#66152 ) (#67272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67272 * Support conv-bn fusion in nested blocks * avoid running script tests twice Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962513 Pulled By: malfet fbshipit-source-id: 3ee79426542f9049cf62ac7b0c1be9d60ae6d014	2021-10-28 08:02:46 -07:00
Gary Miguel	9deb602726	[ONNX] Use Reciprocal operator instead of Div(1, x). (#65382 ) (#67271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67271 * [ONNX] Use Reciprocal operator instead of Div(1, x). This is a more readable and perhaps more performant way to export torch.reciprocal. * Use Reciprocal in caffe to operator to import onnx Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962519 Pulled By: malfet fbshipit-source-id: d926e75b1c8312b9a980c9a1207a1a93ba0c71e0 Co-authored-by: take-cheeze <takechi101010@gmail.com>	2021-10-28 08:01:21 -07:00
Onyiee	eea20bfa15	fixed type checking errors in fuse.py (#66799 ) Summary: Fixes [Issue#70](https://github.com/MLH-Fellowship/pyre-check/issues/70) This PR fixes the type checking error that was found in fuse.py as follows: torch/quantization/fx/fuse.py:34:13 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. Signed-off-by: Onyemowo Agbo Pull Request resolved: https://github.com/pytorch/pytorch/pull/66799 Reviewed By: 0xedward Differential Revision: D31961462 Pulled By: onionymous fbshipit-source-id: 7481afc07152ba13f3224e4ad198fd8e2c34c880	2021-10-28 07:45:28 -07:00
Mike Iovine	7da9c4ed2e	[SR] NNC out variant for aten::where (#67255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67255 Add an out variant for `aten::where`. Since this op can be implemented quite trivially in NNC with `ifThenElse`, I added an NNC kernel as well. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: navahgar Differential Revision: D31923886 fbshipit-source-id: b4379ee3aaf31a000e626b4caeafd3e3f3d60837	2021-10-28 06:48:22 -07:00
Ben Koopman	3aadff651c	[quant][embedding qat][bugfix] Fix and test QAT EmbeddingBag from_float error message (#66989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66989 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31961773 Pulled By: b-koopman fbshipit-source-id: 0d28728c87751ffc696ac221c3e8e75ac923cc57	2021-10-28 06:29:20 -07:00
Nicolas Hug	62feadd76f	Replace issue templates with new issue forms (#65917 ) Summary: This PR introduces the new issue forms that replace issue templates. This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml)) The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions. There is still a "blank" template available. I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this covers all previous use-cases properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65917 Reviewed By: VitalyFedyunin Differential Revision: D31894777 Pulled By: NicolasHug fbshipit-source-id: fbd39f7ed4cadab732d106d3166c04c451c31f94	2021-10-28 04:49:47 -07:00
Mike Iovine	6827d36c1a	[Static Runtime][DI] Fuse list unpack and variadic_grouped_accessor_op (#66585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66585 Add a new op `static_runtime::fused_variadic_grouped_accessor_op` that outputs many tensors rather than a single tensor list. Incorporated this new op into `FuseListUnpack`. This eliminates `ListUnpack` overhead and tensor refcount bumps. Test Plan: Accuracy Test Model 294738512_40 (manually confirmed that fusion happens) ``` get 2861 prediction values get 2861 prediction values max_error: 0 total: 0 ``` Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors. Performance TW replayer test w/ 800 QPS (stacked with D31482816 (`72e25c9f4e`)) shows 5% CPU decrease for storage tier. Results: {F673610679} Reviewed By: hlu1 Differential Revision: D31620408 fbshipit-source-id: f05c298bcbce61a491b63d575af4aca746881696	2021-10-28 04:34:57 -07:00
jjsjann123	90b722c544	specializeGradSumToSize patch - propagate profile_none through profile_ivalue (#63941 ) Summary: simply propagate profile_none_ value through profile_ivalue nodes inserted by nvfuser. Without the propagation, profile_ivalue inserted by other passes would block the optimization on no-op sum_to_size. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63941 Reviewed By: shunting314, cpuhrsch Differential Revision: D31972765 Pulled By: Krovatkin fbshipit-source-id: 4fa571a758e269b486c584f47c2a933de82d463b	2021-10-27 22:54:09 -07:00
Wanchao Liang	fc664ac272	[sharded_tensor] easier initialization for Shard (#66351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66351 This add the ability for use to just provide shard_offsets and optionally rank, to construct a local shard, instead of knowing there's a ShardedMetadata. Under the hood, we will construct the ShardedMetadata by inferring shard_lengths and device from the local tensor. ghstack-source-id: 141742410 Test Plan: test_local_shards Reviewed By: pritamdamania87 Differential Revision: D31519919 fbshipit-source-id: 8f3b4682ffc74b79b41076f3f4b832f4cacda49d	2021-10-27 22:20:37 -07:00
Wanchao Liang	71a67d0ce9	[sharded_tensor] simplify init_from_local_shards API (#64481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64481 This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 141742350 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D30748504 fbshipit-source-id: 6e97d95ffafde6b5f3970e2c2ba33b76cabd8d8a	2021-10-27 22:19:20 -07:00
Jerry Zhang	0117ada47c	[quant][graphmode][fx] Add input_idx_to_dtype and ouptut_idx_to_dtype to backend_config_dict (#67067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67067 Plan to gradually adding features to backend_config_dict, this PR adds support for specifying the dtype for input and output of a given pattern Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849074 fbshipit-source-id: ca2fbb873176fe72e08ea79ed1bc659bf27cbd8a	2021-10-27 22:10:12 -07:00
Tao Xu	e332d80299	[iOS][CoreML] Remove shape information from TensorSpec (#67412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67412 For inputs, we'll be using the shape from PyTorch tensors. For outputs, we'll be using the shape from MLMultiArray. Thus, we can decouple from the symbolic shapes defined in the compile spec. ghstack-source-id: 141746346 Test Plan: - Sandcastle - buck test pp-ios Reviewed By: hanton Differential Revision: D31299408 fbshipit-source-id: 337d5bb9efc2ff51409586c288d607399b937212	2021-10-27 21:55:29 -07:00
Tao Xu	04aba42ed7	[Core ML] Assign Core ML computationUnit to executor (#67411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67411 This was overlooked before. ghstack-source-id: 141746345 Test Plan: buck test pp-ios Reviewed By: hanton Differential Revision: D31977097 fbshipit-source-id: f5ce9f7d58c3f35097caaa75f75310a89c918387	2021-10-27 21:55:27 -07:00
Tao Xu	7e1a53cd5c	[Core ML] Fix error messages (#67410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67410 As title ghstack-source-id: 141537215 Test Plan: buck-test pp-ios Reviewed By: hanton Differential Revision: D31901372 fbshipit-source-id: 80ae1cf8cb67c0e2ca276e21cc80b1ff799437a4	2021-10-27 21:54:14 -07:00
Scott Wolchok	fae1c0a434	[PyTorch] Reduce refcount bumps in ClassType (#66724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66724 Forwarding fix from previous diff through the ClassType getters & moving Types in where possible. ghstack-source-id: 141594741 Test Plan: CI Reviewed By: suo Differential Revision: D31697995 fbshipit-source-id: 05d6af7c23e3b7a94db75b20d06338bc9ade3e20	2021-10-27 19:32:33 -07:00
Scott Wolchok	c8dd90c858	[PyTorch] Fix extra refcount bumps in ClassAttribute (#66723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66723 Missing move in constructor and forced copy in getter. ghstack-source-id: 141594742 Test Plan: CI Reviewed By: suo Differential Revision: D31697702 fbshipit-source-id: c2018531e7ec4a4853cd003ea3753273a5fae7fb	2021-10-27 19:31:22 -07:00
Supriya Rao	1cfdb6f4c6	[quant][fx] add pass to duplicate dequant nodes with multi use (#67118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67118 Fixes a bug in the reference pattern support for nn.Linear when the same quantized input is shared across multiple Linear nodes. This PR adds a pass to duplicate the dequant nodes for each use so that for a case like ``` x -> quant -> dequant -> linear1 - quant1 \| linear2 - quant2 ``` We duplicate the dequant nodes ``` x -> quant -> dequant1 -> linear1 - quant1 \| dequant2-> linear2 - quant2 ``` So that we can match each pattern in the loweing step We also add a pass to remove the extra/duplicate dequant nodes that may be leftover from the above pass if we don't lower them based on pattern match Test Plan: python test/test_quantization.py test_ref_pattern_multi_use Imported from OSS Reviewed By: mrshenli Differential Revision: D31873511 fbshipit-source-id: aea0819222f084635157426743a50e065e6503c3	2021-10-27 18:25:35 -07:00
Nikolay Korovaiko	9e175400ac	Moving python binding to _C and its decl to the right pyi file (#67365 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67365 Reviewed By: malfet, albanD Differential Revision: D31972163 Pulled By: Krovatkin fbshipit-source-id: e5313c2c8cb810b57b7fe16af8ba26edbe486488	2021-10-27 17:33:45 -07:00
Shunting Zhang	882446c1d2	add frozen_numpy to :builtin_registry_cuda target (#67396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67396 frozen_numpy did not work on GPU since we didn't added register_frozennumpy to the :builtin_registry_cuda target. This was not found earlier since the unit test we added to test_deploy.cpp is only run on CPU. On GPU, we run test_deploy_gpu.cpp which does not contains the added unit tests for numpy. In this diff, I just duplidate the unit tests for numpy (and pyyaml) across test_deploy.cpp and test_deploy_gpu.cpp. I think ideally we should consolidate there 2 files to a single one. So we can add unit test in a single place while run them in both hardward platforms. Tracking task: T104399180 ghstack-source-id: 141750276 Test Plan: buck test mode/opt :test_deploy_gpu Reviewed By: suo Differential Revision: D31978156 fbshipit-source-id: 2f5cd55ca33107cc7d230b72f1353df81f0a3bda	2021-10-27 17:29:25 -07:00
Hao Lu	9ebc6357b3	[SR] Vectorize int version of fmod (#67313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67313 Reviewed By: swolchok Differential Revision: D31889868 fbshipit-source-id: a0af399431a0d672fa56cf2f2ba6d548c47bcedd	2021-10-27 17:02:53 -07:00
Jacob Szwejbka	dea8b27433	[Pytorch Edge] Make some torchbind classes selective (#67340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67340 Currently Torchbind classes arent selective. This makes is a rough granularity pass that will remove entire classes if they arent selected. If we need finer granularity in the future we can make individual methods within classes Selective though instrumenting that will be significantly more involved I think. On a linux build only __torch__.torch.classes._nnapi.Compilation remains unselective. I cant find where its registered :P (theres a couple Android only ones and presumably some metal only ones as well) Many of the classes registered in functions returned a reference to the class that was created. I talked with dreiss about it and we decided that this seemingly didnt serve any purpose, and leaving it like that would make the return value difficult (but possible) to create with selectivity. Since it seems useless anyway I just changed them to return an int so that they can still be called from a global scope, but not have any issues with the return type. ghstack-source-id: 141690776 Test Plan: CI, model unit tests, test models in prod apps Reviewed By: dhruvbird Differential Revision: D31092564 fbshipit-source-id: 657f7eb83490292436c15cf134ceca9b72fb9e1a	2021-10-27 16:58:27 -07:00
Zhengxu Chen	f20614af21	[jit] Allow custom class functions to be traced in invokeScriptMethodFromPython(). (#67380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67380 Test Plan: eyes Reviewed By: tugsbayasgalan Differential Revision: D31975656 fbshipit-source-id: 47c8c9854899e9fed5a635f88470711dc4c95970	2021-10-27 16:38:50 -07:00
Ivan Yashchuk	2267a984eb	[ROCm] Add sparse mappings for CUDA->HIP translation (#67323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67323 Applied patch proposed by Jeff https://github.com/pytorch/pytorch/pull/63948#issuecomment-952166982. In PyTorch, we map cuBLAS->rocBLAS and cuSPARSE->hipSPARSE. Note the prefix, roc versus hip. The 'hip' APIs offer a more direct CUDA-friendly mapping, but calling rocBLAS directly has better performance. Unfortunately, the `roc` types and `hip` types differ, i.e., `rocblas_float_complex` versus `hipComplex`. In the case of SPARSE, we must use the hip types for complex instead of the roc types, but the pytorch mappings assume roc. Therefore, we create a new SPARSE mapping that has a higher priority. Its mappings will trigger first, and only when a miss occurs will the lower-priority pytorch mapping take place. When a file contains "sparse" in the filename, a mapping marked with API_SPARSE is preferred over other choices. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31969246 Pulled By: cpuhrsch fbshipit-source-id: 4ce1b35eaf9ef0d146a0955ce70c354ddd8f4669	2021-10-27 16:28:37 -07:00
Alban Desmaison	708f7b1209	Update extending doc to cover forward mode AD (#66962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962 Reviewed By: VitalyFedyunin Differential Revision: D31897782 Pulled By: albanD fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499	2021-10-27 14:18:38 -07:00
Shubham Bhokare	d9a5668983	[ONNX] Add dim argument to all symbolic (#66093 ) (#67270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67270 * Add dim argument to all symbolic * All symbolic depends on any symbolic Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962518 Pulled By: malfet fbshipit-source-id: f7ee05cf4eff5880fc508154267e060952b5b42d	2021-10-27 13:46:31 -07:00
Gary Miguel	cb15df76ad	[ONNX] Update onnxruntime to 1.9 for CI (#65029 ) (#67269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67269 Test Plan: Imported from OSS Reviewed By: ngimel, msaroufim Differential Revision: D31962516 Pulled By: malfet fbshipit-source-id: 39b3c6a4a05d7b769f0ef5ce7ea597209516cde2	2021-10-27 13:45:07 -07:00
Richard Barnes	9900310133	Fix sign warnings in CUDA kernels (#66753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66753 Fixes these Wextra compilation errors: ``` stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits] 49 \| AT_DISPATCH_ALL_TYPES_AND2 (`44fd312604`)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() { \| ~~^~~ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ ``` And also these warnings: ``` caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here ``` I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (`cff1ff7fb6`), but apparently that was insufficient. Test Plan: Sandcastle Reviewed By: malfet, ngimel Differential Revision: D31708173 fbshipit-source-id: 7714f6bbf109d2f2164630d3fc46bad18046c06c	2021-10-27 13:39:27 -07:00
Nikita Shulga	3a1aa31a2f	Add dummy bfloat16 VSX implementation (#67331 ) Summary: Just a copy of DEFAULT bfloat16 implementation and revert restriction introduced by https://github.com/pytorch/pytorch/pull/61630 Fixes https://github.com/pytorch/pytorch/issues/66867 and https://github.com/pytorch/pytorch/issues/62016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67331 Reviewed By: ngimel Differential Revision: D31959916 Pulled By: malfet fbshipit-source-id: 8ca5e65ca041fef67ee18ddbb215cff01fd1e004	2021-10-27 13:35:38 -07:00
Shirong Wu	7484941eaa	Wrap TRTInterpreter result with wrapper (#67307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67307 Wrap TRTInterpreter result so that any future change to output params is less likely to break existing use cases. Test Plan: Run test with all touched file Reviewed By: 842974287 Differential Revision: D31945634 fbshipit-source-id: 7cf73a1ef0098bff2013815f2f1692233ef7ec14	2021-10-27 13:24:50 -07:00
Priya Ramani	fa70d72e95	Set kernel func name from AOT Compiler (#67229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229 Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated. This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later. Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well. ``` (pytorch) ~/local/pytorch kname └─ $ python3 test/test_jit_fuser_te.py CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing ........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. L = torch.cholesky(A) should be replaced with L = torch.linalg.cholesky(A) and . . . ......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release. The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion. L, _ = torch.symeig(A, upper=upper) should be replaced with L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L') and L, V = torch.symeig(A, eigenvectors=True) should be replaced with L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.) ......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) /data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at ../torch/csrc/jit/python/pybind_utils.h:691.) return callable(args, *kwargs) .....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check) [W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check) ........................................................................s.......s...s.s....s......s..sss............................ ---------------------------------------------------------------------- Ran 503 tests in 37.536s OK (skipped=10) ``` Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D31945713 Pulled By: priyaramani fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b	2021-10-27 13:10:49 -07:00
Jane Xu	5347dab851	Set test owners for onnx tests (#66860 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66860 Reviewed By: malfet Differential Revision: D31964696 Pulled By: janeyx99 fbshipit-source-id: 4e77d1bda92d9107ca0b90a06d24fa4477ceaffa	2021-10-27 12:50:45 -07:00
Mike Iovine	72e25c9f4e	[Static Runtime][DI] Add variadic grouped_accessor_op (#66289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66289 Add a variadic version of `grouped_accessor_op` to eliminate list construction overhead and associated refcount bumps in static runtime. Test Plan: Accuracy test with model 294738512_40: passes with 0 errors. Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors. Perf impact TW replayer test w/ 800 QPS (stacked with D31620408) shows ~5% CPU decrease for storage tier. Results: {F673610665} Reviewed By: hlu1 Differential Revision: D31482816 fbshipit-source-id: 14393da122cefd094c3e4f423beb897c1d17b32c	2021-10-27 12:29:33 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Eli Uriegas	0101b1ea2b	[skip-ci] .github: Set linux gpu instances to be non-ephemeral (#67345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67345 Was hitting capacity issues, setting these to non-ephemeral would mean keeping the current capacity at the expense of "unclean" nodes Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31965477 Pulled By: seemethere fbshipit-source-id: 6d45fb34d07d55c5112db065af2aa0a8b1fd8d1f	2021-10-27 11:55:45 -07:00
Zhengxu Chen	b55a2500d2	[jit] Remove graph() call from abstract Function interface. (#65967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967 Graph is an implementation detail. If user wants to get access to the underlying graph, they should be able to explicitly dynamic cast instead. ghstack-source-id: 141659819 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326153 fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84	2021-10-27 11:54:26 -07:00
Ivan Yashchuk	7c48b9ee25	Sparse CSR CUDA: add `triangular_solve_out` (#61858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858 This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is used to comput the solution to the linear system where coefficient matrix is triangular. Structured kernels are used and the meta function needed some changes to support sparse csr layout. With sparse matrix input the `cloned_coefficient` tensor is 0-sized tensor. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31948435 Pulled By: cpuhrsch fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd	2021-10-27 11:12:20 -07:00
Shiyan Deng	4b9464f4b9	[fx]Early return if a node tries prepend self (#67068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67068 Prepending a node to itself will result in the node gets removed from the graph. Usually people won't prepend a node with itself. But people would accidentally try to append a node that's already next to `self` node, which will be prepending `self` to `self`. Test Plan: Added a unit test Reviewed By: jamesr66a Differential Revision: D31849030 fbshipit-source-id: b0fdfbb893f785f268595acd823b426d57c15e61	2021-10-27 10:49:45 -07:00
Eli Uriegas	2669e4ed4e	Revert D31945507: .github: Switch 8xlarge to 4xlarge instance_type Test Plan: revert-hammer Differential Revision: D31945507 (`1541bb823a`) Original commit changeset: fb8587de7f31 fbshipit-source-id: 3760f5610f0c9cd5298a35490c549e56f7396aaf	2021-10-27 10:02:51 -07:00
Jane Xu	7d1c0992e1	GHA: add back runner type for distributed tests (#67336 ) Summary: Addresses https://github.com/pytorch/pytorch/pull/67264#issuecomment-953031927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67336 Test Plan: the 8x is used for the distributed config ![image](https://user-images.githubusercontent.com/31798555/139103861-38d7dc37-ca8b-4448-b3ec-facc24aee342.png) Reviewed By: malfet Differential Revision: D31961179 Pulled By: janeyx99 fbshipit-source-id: cd21e2bf2a7c6602c9a42a53759b720959e43b8d	2021-10-27 09:34:18 -07:00
soulitzer	f2f7b02b4c	Add support for vmap+fwdAD for basic out-of-place op (#66291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66291 In this PR: - Trivial batching rules for `make_dual` and `is_same_size` that enable forward ad + vmap functionality - Adds a check in gradcheck that is performed when both `check_batched_grad` and `check_forward_ad` are `True` (an OpInfo using this is added later in the stack). - Tests for the gradcheck functionality - Tests that basic out-of-place op works Test Plan: Imported from OSS Reviewed By: albanD, saketh-are Differential Revision: D31842018 Pulled By: soulitzer fbshipit-source-id: 84b18d9a77eeb19897757e37555581f2a9dc43d8	2021-10-27 08:55:06 -07:00
Shen Li	a3aa9df59f	Add barrier to ProcessGroup trampoline (#67236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67236 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31916706 Pulled By: mrshenli fbshipit-source-id: f3d2bcd938a384ec297f4094831c69d4059316bb	2021-10-27 08:18:07 -07:00
Ivan Kobzarev	e52d0e773b	[tensorexpr][ir][quant] Adding qscale and qzero to tensorexpr IR Buf (#66675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66675 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31676328 Pulled By: IvanKobzarev fbshipit-source-id: c6479415fa7d809e02dd3789ee0bfd6dfe50dc92	2021-10-27 01:32:16 -07:00
Shen Li	632719c214	Enable c10d trampoline tests on MacOS (#67205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67205 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31916705 Pulled By: mrshenli fbshipit-source-id: 440d319959796d01c637c277706eeab127d9bde7	2021-10-26 20:40:12 -07:00
Bangsheng Tang	c88da701e2	[hpc][inference] enable cuda graph in engine holder (#66738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66738 added a field `max_batch_size` to TRTModule, which would be later used to determine how big the engine holder would need to pad the input to Reviewed By: 842974287 Differential Revision: D31286509 fbshipit-source-id: be5c6d4ad9c87ca0842679dc507b187275d4e8dc	2021-10-26 18:48:05 -07:00
Sangbaek Park	28570664d5	[Vulkan] Add vulkan_perf_test with google benchmark (#67230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67230 Added a new test `vulkan_perf_test` for measuring performance with google benchmark. Summay: * `vulkan_perf_test` can be used to perform a quick benchmark test for Vulkan features to compare before and after performance when applying a new method and/or optimizing the existing implementation on your local machine. * The google benchmark 3rd party library (https://github.com/google/benchmark) is already in the repo (`fbsource/third-party/benchmark`). * The number of threads is set to 1 since Vulkan backend is not thread-safe. * Added a new API `Context::wait()` to allow benchmark tests to wait for all GPU operations to be done before calling `Context::flush()` * Call `Context::wait()` for each output Vulkan tensor and then `Context::flush()` to avoid out-of-memory issues while running a number of iterations in the benchmark test code * Use `Time` column (wall clock) as a total execution time for each iteration (instead of `CPU` column = CPU execution time only) from the benchmark result table * The more iterations, the more reliable data. But, it will take much longer. 100-1,000 iterations for bigger tensors and 5,000-10,000 iterations for smaller ones would be sufficient. * The benchmark data on MacOS is not reliable since there is an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`. And also running Vulkan models on MacOS instead of Metal ones is generally not a good idea. Next steps: * Add more benchmark tests as we optimize more Vulkan operators * Consider using Vulkan own performance counter such as [uVkCompute](https://github.com/google/uVkCompute) in the near future. Each iteration time can be manually set by `benchmark::State::SetIterationTime()` and `Benchmark::UseManualTime()` APIs (see [UseManualTime API](`365670e432/include/benchmark/benchmark.h (L1013)`)) * Consider this `vulkan_perf_test` as a performance BAT (Build Acceptance Test) on the CI pipeline. `gtest` and `google benchmark` can be written in the same place ([see](https://stackoverflow.com/questions/8565666/benchmarking-with-googletest)). And [swiftshader](https://github.com/google/swiftshader) can be used for Sandcastle devservers that don't support Vulkan. We may come up with a reasonable performance criteria for each test and it will fail if any significant performance degradation. Test Plan: Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 ``` Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 60.4 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 24.1 ms 0.947 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.6 ms 14.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.844 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.02 ms 0.845 ms 5000 ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 13.3 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 16.4 ms 3.49 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.7 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.93 ms 0.855 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.14 ms 0.852 ms 5000 ``` Note that the smaller tensors (`3.93 ms` vs `6.14 ms` when comparing `{3,4,221,193}` with `{3,3,221,193}`) receive significant improvement on the Android builds. Because `vkCmdCopyImage` API is used for the bigger tensor `{3,4,22,193}` instead of shader operations. * `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster) * `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster) * `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster) {F674052834} Test result on MacOS ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 5.95, 5.02, 5.15 *WARNING* Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 51.2 ms 35.5 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.4 ms 4.76 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 51.9 ms 35.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.84 ms 1.36 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.30 ms 1.13 ms 5000 ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 70.1 ms 36.9 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.8 ms 5.00 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 69.3 ms 36.8 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.60 ms 1.48 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 3.65 ms 1.41 ms 5000 ``` Note that `{3,40,221,193}` input tensors don't receive any performance improvement when we use `vkCmdCopyImage` API to directly copy textures when the number of channel is a multiple of 4 on MacOS. This is maybe due to an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`. Reviewed By: SS-JIA Differential Revision: D31906379 fbshipit-source-id: 0addc766502dba1a915b08840b3a4dc786a9cd9d	2021-10-26 17:55:42 -07:00
Sangbaek Park	cdc9b26281	[Vulkan] Optimize cat operator for channel dimension (#67207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67207 Improved performance for `cat` operator for channel dimension: * Improved when the input tensor's channel size is a multiple of 4. * Add new test cases to cover this scenario * Limitation: We can't mix up using shader and `vkCmdCopyImage` at the same time. The way we create the output texture is different between two so we can't cross unless we create the output texture every time. We consider using `vkCmdCopyImage` only if all input tensors' channel is a multiple of 4. {F673815905} Test Plan: Test Conditions * 3 input tensors with size `{3, 40, 221, 193}` * Number of iteration: `1,000` * Compare `Time` column (`CPU` column is only for CPU execution time) * Flushes resources every 1 iteration since the input tensor size is big * running vulkan_perf_test requires a separate diff (D31906379) Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on Mac ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 a) Without using `vkCmdCopyImage` for multiples of 4 in channel dimension ``` Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 60.4 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 24.1 ms 0.947 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.6 ms 14.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.844 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.02 ms 0.845 ms 5000 ``` b) With using `vkCmdCopyImage` for multiples of 4 in channel dimension ``` Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 13.3 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 16.4 ms 3.49 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.7 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.93 ms 0.855 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.14 ms 0.852 ms 5000 ``` * `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster) * `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster) * `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster) {F674052795} Reviewed By: SS-JIA Differential Revision: D31781390 fbshipit-source-id: 42179d28ae461a9e247053bc9718f6b8c6c819e5	2021-10-26 17:54:19 -07:00
Nikita Shulga	d691bc1207	Revert D31937065: [pytorch][PR] fix binding to the wrong python module Test Plan: revert-hammer Differential Revision: D31937065 (`7ac8ed741d`) Original commit changeset: 5c10b2870bcc fbshipit-source-id: 9b21ffea8054b8a3a0b96e1b78e933f8654e7f2f	2021-10-26 17:40:59 -07:00
Christopher Gray Howard	dfa7225a38	[Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587 Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141484731 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex' https://pxl.cl/1R7kJ Reviewed By: albanD Differential Revision: D31630069 fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8	2021-10-26 17:35:01 -07:00
Eli Uriegas	fcefed9517	Revert D31935958: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build Test Plan: revert-hammer Differential Revision: D31935958 (`00b0d4eeed`) Original commit changeset: 3e2cc5c8bc18 fbshipit-source-id: 3f22bf88d902891b83d836e3c53be9a214a58f1f	2021-10-26 17:30:22 -07:00
Eli Uriegas	1541bb823a	.github: Switch 8xlarge to 4xlarge instance_type (#67299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67299 Switches the linux.8xlarge.nvidia.gpu to the 4xlarge instance type to help with queueing / capacity issues. This change is only meant to be a bridge until everyone updates their PRs to use the new linux.4xlarge.nvidia.gpu node type NOTE: This node type will be removed so do not depend on it for any new workflows. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31945507 Pulled By: seemethere fbshipit-source-id: fb8587de7f31da72e968d46eeecc573d3f5b440f	2021-10-26 17:23:46 -07:00
Nikolay Korovaiko	7ac8ed741d	fix binding to the wrong python module (#67246 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67246 Reviewed By: zhxchen17 Differential Revision: D31937065 Pulled By: Krovatkin fbshipit-source-id: 5c10b2870bccece50ba52dde26127da79bccbba6	2021-10-26 17:19:02 -07:00
Kimish Patel	0e8bd0c8d6	[Pytorch Delegated Backend] Add macro to define sentinel value of debug handle. (#66584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66584 This will help avoid "-1"s in different places in our and backend codebase when debug handle is not known. Test Plan: CI Reviewed By: sxu Differential Revision: D31614478 fbshipit-source-id: 97fceb04e3e78f52feda7b1ba1da08fa4480dd77	2021-10-26 17:13:44 -07:00
Shunting Zhang	00b0d4eeed	Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67280 Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D31935958 Pulled By: shunting314 fbshipit-source-id: 3e2cc5c8bc18b5e19bd3804ad542a9ed69e04291	2021-10-26 16:39:40 -07:00
Zhengxu Chen	f510193e22	[jit][edge] Export maybe-used interface methods from modules. (#65966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966 ghstack-source-id: 141594521 Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++. Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first. NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it. Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)' Reviewed By: qihqi, iseeyuan Differential Revision: D31326155 fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53	2021-10-26 16:35:15 -07:00
Natalia Gimelshein	a72a6365c9	disallow requires_grad=True in make_tensor for integral inputs (#67149 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67149 Reviewed By: albanD Differential Revision: D31928613 Pulled By: ngimel fbshipit-source-id: 4491954c4fcd4a4e3121155d4451cc7370c27a0b	2021-10-26 16:19:28 -07:00
Eli Uriegas	81d188101f	.github: Use 4xlarge instances for linux gpu (#67264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67264 Downgrades linux gpu instances from 4xlarge -> 8xlarge We were seeing capacity issues in terms of scaling 8xlarge instances, downgrading this to 4xlarge (which only have a single gpu) to see if that helps resolve some of the capacity issues we were seeing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31933488 Pulled By: seemethere fbshipit-source-id: b41922ebb675e663cb035cd3795bc9bae94dcac7	2021-10-26 16:17:33 -07:00
Ivan Yashchuk	fdc74e2373	Port triangular_solve to structured kernel (#61857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61857 A few updates to internal code that allow marking triangular_solve as structured. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31928687 Pulled By: cpuhrsch fbshipit-source-id: 80a2783c469d5a6194c466ccfa8808fa41c0bdb7	2021-10-26 14:50:00 -07:00
Scott Wolchok	6ce14e7b51	[PyTorch][Static Runtime] Cleanup: add valueVecFromFastSet (#66996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66996 We do this conversion a few times, and further diffs (which I'm trying to keep as small as possible) will do it more. ghstack-source-id: 141496817 Test Plan: CI Reviewed By: mikeiovine Differential Revision: D31821037 fbshipit-source-id: 1d3b54cadaedd53189aec6a35ed1a126c6fe4824	2021-10-26 14:47:15 -07:00
Scott Wolchok	066a980e7b	[PyTorch][Static Runtime][easy] Fix ValueGroup comment (#66965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66965 external aliases aren't defined to be outputs (though output aliases may end up in there as the following sentence clarifies). ghstack-source-id: 141473794 Test Plan: review Reviewed By: mikeiovine Differential Revision: D31809715 fbshipit-source-id: 82d1391b04e22559932f82270669a7ff94a1c90f	2021-10-26 14:45:36 -07:00
Shen Li	1926156752	Prevent TCPServer get deleted too early (#67204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67204 Fixes #66422 Fixes #66423 In the original test, all collectives are dummy local ones. As a result, rank 0 could exit earlier than other ranks. However, the `TCPStore` lives on rank 0, and other ranks might need to talk to that store after rank 0 exits. This commit explicitly makes rank 0 wait for all other ranks to finish. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31906802 Pulled By: mrshenli fbshipit-source-id: 82745f5497d784ea3cea9df6bda537ec71380867	2021-10-26 14:38:11 -07:00
Natalia Gimelshein	273ab55fc4	Revert D31914868: Strided masked reduction: mean (2nd try) Test Plan: revert-hammer Differential Revision: D31914868 (`a33d3d84df`) Original commit changeset: beda9d32ea65 fbshipit-source-id: dc3fa66d7e3c8a211fedac6ae191b11a4a9ab232	2021-10-26 14:18:22 -07:00
Rohan Varma	2ca552160b	[DDP] logging improvements (#67059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67059 Debugging some workflows, and sometimes the training does not finish but I want to know whether the graph was not static. Also, log 0 for unused parameter size if no unused params were found. ghstack-source-id: 141428950 Test Plan: Ci Reviewed By: mrshenli Differential Revision: D31846669 fbshipit-source-id: 21763fcdc1b244ba829117da1f15b2271d966983	2021-10-26 13:18:00 -07:00
Eli Uriegas	197dec14b3	.github: Change periodic docker jobs to always_rebuild (#67267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67267 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: xuzhao9 Differential Revision: D31934251 Pulled By: seemethere fbshipit-source-id: a323d2c754ff6324c69f81bf0e820ae9adbe7853	2021-10-26 13:06:16 -07:00
Aditya Pillai	99b34b320b	Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t> (#67123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67123 Makes `sigrid_hash_compute_multipler_shift` return a tuple instead of a tensor and modifies functions that depends on it. Test Plan: ``` buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` Benchmarks: `local`: ``` I1022 13:56:34.529495 2866038 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.67114, standard deviation: 0.336918 I1022 15:29:45.248790 3292725 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.66678, standard deviation: 0.403032 ``` `local_ro`: ``` I1022 13:59:24.262511 2882599 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.56012, standard deviation: 0.0537101 I1022 15:34:53.941890 3328358 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.5525, standard deviation: 0.0280267 ``` FB: local - P463676888, local_ro - P463676984, master local - P463686094, master local_ro - P463686470 Reviewed By: mikeiovine Differential Revision: D31867186 fbshipit-source-id: 0f640487b74d1cd0d5f714f2258e056a2f0c2c07	2021-10-26 12:51:10 -07:00
Pavithran Ramachandran	1ce500f56f	[easy][PyTorch] Use `at::native::is_nonzero` (#67195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195 Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero` ghstack-source-id: 141514416 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31704041 fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d	2021-10-26 12:40:32 -07:00
Pearu Peterson	a33d3d84df	Strided masked reduction: mean (2nd try) (#67088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31914868 Pulled By: cpuhrsch fbshipit-source-id: beda9d32ea65bcae31c2c0181f95ad23c6631075	2021-10-26 11:54:39 -07:00
Jacob Szwejbka	6c22b96082	[Pytorch Edge] Extend Tracer to Custom Classes (#67004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67004 New version because the other one was impossible to rebase Trace custom classes Test Plan: CI. Reviewed By: dhruvbird Differential Revision: D31818978 fbshipit-source-id: daa22ccb153e32685bcca43a303ba9e21042d052	2021-10-26 11:38:06 -07:00
Eli Uriegas	34ee5b11ff	.github: Add 4xlarge nvidia gpu to scale-config (#67262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67262 Adds a 4xlarge nvidia gpu variant to our scale-config.yml Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31931941 Pulled By: seemethere fbshipit-source-id: 120c73ad2c973a416a8426ad6f67457f87302db5	2021-10-26 11:19:16 -07:00
Eli Uriegas	7052c41899	.github: Add workflow to build all docker images (#67215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67215 We were regularly seeing gaps in our docker image builds due to specific workflows not being run when docker builds occurred on PRs, this should remove that ambiguity and ensure that all docker builds be re-built if a rebuild is deemed necessary Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31910422 Pulled By: seemethere fbshipit-source-id: f346e64f1857e35a995c49bf30521a3acd8af0b1	2021-10-26 11:14:04 -07:00
Howard Huang	d7ac6e977a	Fix test_create_store_multi flaky test (#66953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66953 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: kiukchung Differential Revision: D31802767 Pulled By: H-Huang fbshipit-source-id: a430e242788aac164496d4e65b85bf326537d019	2021-10-26 11:08:51 -07:00
vfdev-5	49bf24fc83	Updated error message for nn.functional.interpolate (#66417 ) Summary: Description: - Updated error message for nn.functional.interpolate Fixes https://github.com/pytorch/pytorch/issues/63845 cc vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/66417 Reviewed By: albanD Differential Revision: D31924761 Pulled By: jbschlosser fbshipit-source-id: ca74c77ac34b4f644aa10440b77b3fcbe4e770ac	2021-10-26 10:33:24 -07:00
Jane Xu	d47a9004c8	[skip ci] Set test owner for mobile tests (#66829 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66829 Reviewed By: albanD Differential Revision: D31928812 Pulled By: janeyx99 fbshipit-source-id: 8116b7f3728df8632278b013007c06ecce583862	2021-10-26 10:20:01 -07:00
Xiao Wang	204ffd33ee	[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance (#64533 ) Summary: Fix https://github.com/pytorch/pytorch/issues/64237 Fix https://github.com/pytorch/pytorch/issues/28293 Fix https://github.com/pytorch/pytorch/issues/4689 See also https://github.com/pytorch/pytorch/issues/47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a	2021-10-26 10:13:52 -07:00
kshitij12345	828a9dcc04	[nn] MarginRankingLoss : no batch dim (#64975 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64975 Reviewed By: albanD Differential Revision: D31906528 Pulled By: jbschlosser fbshipit-source-id: 1127242a859085b1e06a4b71be19ad55049b38ba	2021-10-26 09:03:31 -07:00
Peter Bell	129e99fbce	__getitem__: Ensure Tensor subclasses are not treated as tuples (#67202 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67027 `torch.Tensor` is considered a Mapping, but not a Sequence in Python because it uses `tp_as_mapping` instead of defining `__getitem__` in Python. However, If you try to overwrite `__getitem__` from Python it is considered a `Sequence` and so the tensor is treated like a tuple for indexing purposes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67202 Reviewed By: VitalyFedyunin Differential Revision: D31908515 Pulled By: albanD fbshipit-source-id: 0ca55a36be3421f96428a8eacf5d195646252b38	2021-10-26 08:56:59 -07:00
Nikita Vedeneev	3c61700cf7	`torch.linalg.householder_product`: forward AD support (#67043 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67043 Reviewed By: VitalyFedyunin Differential Revision: D31897617 Pulled By: albanD fbshipit-source-id: ef135fe3d9e5b9b2a541c355017f07cdb1309979	2021-10-26 08:34:00 -07:00
Digant Desai	5b345e767e	QNNPACK: Update to use pytorch/cpuinfo.git repo as a third party dependency (#67106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67106 Test Plan: Recloned cpuinfo, rebuilt, and ran all the tests locally Reviewed By: kimishpatel Differential Revision: D31782317 fbshipit-source-id: 4a71be91f02bb6278db7e0124366d8009e7c7a60	2021-10-26 07:59:19 -07:00
Shen Li	2abffaf050	Consolidate c10d and dist imports in test_c10d_common.py (#67203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67203 This commit uses `dist` for `torch.distributed` and `c10d` for `torch.distributed.distributed_c10d`. The former is for public APIs and the latter is for private ones. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31906801 Pulled By: mrshenli fbshipit-source-id: c3a01f33962b01a03dbd565ed119dcdac594bcf2	2021-10-26 07:50:48 -07:00
Jane Xu	71b7182ee2	[skip ci] Set test owner for deploy/package tests (#66830 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66830 Reviewed By: albanD Differential Revision: D31905820 Pulled By: janeyx99 fbshipit-source-id: 9496acc98339d689fa62e18a8781d7344903a64c	2021-10-26 07:49:33 -07:00
Jane Xu	49251d05ec	[skip ci] Set test owners for NNC tests (#66833 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833 Reviewed By: albanD Differential Revision: D31907812 Pulled By: janeyx99 fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602	2021-10-26 07:46:18 -07:00
Jeff Daily	a6d702a3ee	add support for ubuntu 20.04 to CI docker images (#66942 ) Summary: Some minor changes are needed to the .circleci docker scripts to support ubuntu 20.04. One edit updates the packages needed for all images (.circleci/docker/common/install_base.sh), while the other edit is specific to ROCm support. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/66942 Reviewed By: albanD Differential Revision: D31899271 Pulled By: janeyx99 fbshipit-source-id: f7677ddc063a4504da9f39a756dc181ac55f200a	2021-10-26 07:41:46 -07:00
Mike Iovine	83355f9537	[SR][easy] Alias for c10::Symbol::fromQualString (#67162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67162 It's a bit annoying/ugly to type `c10::Symbol::fromQualString` everywhere, and we can't do `using c10::Symbol::fromQualString` since it's a static class function. Test Plan: CI Reviewed By: d1jang Differential Revision: D31887042 fbshipit-source-id: 073a56c72281c20284a9feef741aed96b58a921d	2021-10-26 06:09:17 -07:00
Vasilis Vryniotis	38cbaeb8a4	Update deprecated import paths. (#67250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67250 Test Plan: Run tests manually Reviewed By: NicolasHug Differential Revision: D31921656 fbshipit-source-id: e2cba7bc7d4a8c7f836bc32f1b8b11a37494a4e2	2021-10-26 04:51:07 -07:00
Hao Lu	0c1b7545b6	[Static Runtime] Add more debug info to verify_no_memory_overlap() (#67206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67206 The memory overlap check still checks the memory overlap for alias ops. It only skips the check for inplace ops. This needs to be fixed if we want to use the memory overlap check in prod. This diff only adds more debug info. It doesn't fix the aforementioned problem. Reviewed By: d1jang Differential Revision: D31889866 fbshipit-source-id: 05a80ace3d404f66f21a8bbdc9678485ff76c8d3	2021-10-26 01:48:41 -07:00
Wanchao Liang	31bcfa3760	[sharded_tensor] refactor sharded_tensor file structure (#67199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67199 This PR refactors _sharded_tensor package so that it splits from api.py, and add different components to make it more modularized, this will also help us resolve circular dependency due to increasing code size and better organize the package: * api.py: sharded tensor APIs * metadata.py: Metadata definition for ShardedTensors * shard.py: Shard definition for ShardedTensor * utils.py: utility functions for validation, etc. ghstack-source-id: 141533618 Test Plan: test_sharded_tensor.py Reviewed By: pritamdamania87 Differential Revision: D31904249 fbshipit-source-id: c747d96e131a1d4731991ec4ac090f639dcb369b	2021-10-26 00:36:23 -07:00
Zheng Yan	b96337cf47	add frozen_pyyaml as a builtin library to torch::deploy (#67127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67127 add frozen_pyyaml as a builtin library to torch::deploy Test Plan: unittests pass > buck test mode/dev-nosan caffe2/torch/csrc/deploy/... -- --regex ".TestPyYAML." Reviewed By: shunting314 Differential Revision: D31852201 fbshipit-source-id: 889c4493faf09ddd3ec2b9487da9acfea3ab6bcd	2021-10-25 23:16:41 -07:00
Alex Beloi	0e371e413d	[fx-acc] add automated graph opt testing using AccOpProperty (#67228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67228 We added `AccOpProperty` for easy enablement of graph opts for new acc ops based on general properties. This diff adds 1. `AccOpProperty.unary` 2. Automated testing for acc ops with both `AccOpProperty.unary` and `AccOpProperty.pointwise` with `sink_reshape_ops` graph opt. [Adds coverage for 30 more acc_ops] 3. Refactors `graph_opts/TARGETS` to collect all graph optimizations into a common library 4. replaces `def foo(, input, acc_out_ty=None): assert acc_out_ty is not None` with just `def foo(, input, acc_out_ty)`. Let me know if there is some hidden purpose to the other implementation. 5. adds `AccOpProperty.*` flags to appropriate ops. Test Plan: `buck test mode/dev glow/fb/fx/graph_opts:test_fx_sink` ``` ... Summary Pass: 31 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724581304 ``` Also ran ``` `buck test mode/dev glow/fb/fx/acc_tracer:` ``` ``` ... Summary Pass: 136 ListingSuccess: 4 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5910974582823618 ``` Reviewed By: jfix71 Differential Revision: D31671833 fbshipit-source-id: aa16d1008f18f7c8626058361efff33843de3505	2021-10-25 19:53:05 -07:00
Bo Wang	3596e13d45	Add torch.nn.init.normal_ and torch.nn.init.kaiming_uniform_ ops to ShardedTensor (#67057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67057 Extend ShardedTensor with torch.nn.init.[normal_, and kaiming_uniform_] ops Follow up from https://github.com/pytorch/pytorch/pull/63997 Test Plan: a) Unit Test (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit# s/uniform_/normal_ or kaiming_uniform_ Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D31845654 fbshipit-source-id: e7aedc0972539da59f7b84bbbf617caf6b206d52	2021-10-25 19:14:30 -07:00
Yinghai Lu	bfcde08612	[trt] Algorithm recorder/replayer (#4 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67211 Record the algorithm selection, dump it in json format and replay it. This has potential to 1. consistently repro the issue (algo selection could be sensitive to local benchmark timing) 2. manual edit the dumped json file to control algorithm selection. Reviewed By: wushirong, 842974287 Differential Revision: D31888836 fbshipit-source-id: 4611fda548f7391776f1ad61572b1f59fa4665b6	2021-10-25 18:50:55 -07:00
Priya Ramani	ecf7e96969	[Light] Remove ambiguity from compile_spec names, use actual output type (#67209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198 Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside. Also use output dtype from buffer. Test Plan: Mobilenetv3 compiles and runs fine ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224 " Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules) Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated Total time: 06:35.0 min BUILD SUCCEEDED The compiled llvm assembly code was saved to mobilenetv3.compiled.ll The compiled model was saved to mobilenetv3.compiled.pt └─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=pytorch_dev_mobilenetv3 . . Columns 961 to 9701e-11 * -4.2304 -3.9674 2.4473 -0.8664 -0.7513 1.2140 0.0010 3.8675 1.2714 2.2989 Columns 971 to 9801e-11 * -2.7203 1.6772 -0.7460 -0.6936 4.4421 -0.9865 -0.5186 -1.4441 1.3047 -1.6112 Columns 981 to 9901e-11 * 0.1275 -1.8815 2.5105 -0.4871 -2.2342 0.8520 0.8658 1.6180 3.8901 -0.2454 Columns 991 to 10001e-11 * -1.4896 4.1337 -2.6640 0.8226 0.2441 -1.4830 -1.7430 1.8758 0.5481 0.5093 [ CPUFloatType{1,1000} ] Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984 Memory usage before main runs: 104366080 bytes Memory usage after main runs: 343441408 bytes Average memory increase per iter: 2.39075e+07 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D31698338 fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b	2021-10-25 17:44:05 -07:00
Michael Shi	ad5731cacc	[PyTorch] Add flop count for bmm and baddbmm (#66636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636 Add FLOP count for bmm and baddbmm, which is `2bmnk`. Reviewed By: ngimel Differential Revision: D31622061 fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b	2021-10-25 17:31:12 -07:00
Chen Lai	7acf0c6d4b	[PyTorch Edge][type] Add type support for NamedTuple custom class (export) (#62612 ) Summary: Add type support for namedtule custom class. For the namedtuple type, it will deserailize to the following format in string ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` If it's nested, it will be ``` "__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B [ NamedTuple, [ [field_name_b, __torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ] ] ] ] ] ] ] " ``` The nametuple type includes both `collection` and `typing`. ``` from typing import NamedTuple from collections import namedtuple ``` It will be a forward incompatible change. However this type is never supported and exported before and we don't have a proper way to backport it. The optimum solution to ship this change is probably 1. Update the change for import without the change to export. So the runtime can read the new format, but no new format will be exported. 2. Update the change to export the new type. So runtime can export new format. For the following example: ``` class Foo(NamedTuple): id: torch.Tensor class Bar(torch.nn.Module): def __init__(self): super(Bar, self).__init__() self.foo = Foo(torch.tensor(1)) def forward(self, a: torch.Tensor): self.foo = Foo(a) return self.foo ``` The new bytecode.pkl will be ``` (6, ('__torch__.mobile.test_lite_script_type.MyTestModule.forward', (('instructions', (('STOREN', 1, 2), ('DROPR', 1, 0), ('MOVE', 2, 0), ('LIST_CONSTRUCT', 0, 1), ('NAMED_TUPLE_CONSTRUCT', 1, 1), ('RET', 0, 0))), ('operators', ()), ('constants', ()), ('types', ('List[Tensor]', '__torch__.mobile.test_lite_script_type.myNamedTuple[NamedTuple, [[a, ' 'List[Tensor]]]]')), ('register_size', 2)), (('arguments', ((('name', 'self'), ('type', '__torch__.mobile.test_lite_script_type.MyTestModule'), ('default_value', None)), (('name', 'a'), ('type', 'Tensor'), ('default_value', None)))), ('returns', ((('name', ''), ('type', '__torch__.mobile.test_lite_script_type.myNamedTuple'), ('default_value', None)),))))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62612 ghstack-source-id: 141485500 Test Plan: fb: 1. Add a simple unittest to test NamedTuple custom class 2. Use following cpp code (D30271153) ``` TEST(LiteTrainerTest, CustomOp) { std::string jit_model = "/home/chenlai/local/notebooks/ads_dper_fl_model_282250609.pt"; Module jit_m = load(jit_model); jit_m.eval(); torch::jit::Module module_freeze = freeze(jit_m); IValue tuple = c10::ivalue::Tuple::create({1 * torch::ones({10, 1034}), 3 * torch::ones({10, 1034})}); std::vector<IValue> inputs_1{tuple}; auto jit_output = jit_m.forward(inputs_1); jit_output.dump(); std::stringstream ss; jit_m._save_for_mobile(ss); jit_m._save_for_mobile("/home/chenlai/local/notebooks/tmp/tmp.ptl"); torch::jit::mobile::Module mobile_m = _load_for_mobile(ss); auto mobile_output = mobile_m.forward(inputs_1); std::cout << "mobile output: " << std::endl; mobile_output.dump(); } ``` And output from both mobile and jit are ``` {prediction: ([ CPUFloatType{0} ], [ CPUFloatType{0} ])} ``` 3. N1033894 with model inspection, also compare the result between jit and mobile with the dper model. Reviewed By: iseeyuan Differential Revision: D30004716 fbshipit-source-id: cfd30955e66a604af8f9633b1b608feddc13d7d7	2021-10-25 17:15:50 -07:00
andrewor	0d7d446154	Disallow annotations on instance attributes outside __init__ (#67051 ) Summary: Summary: This commit solves the first part of https://github.com/pytorch/pytorch/issues/52306, which disallows type annotations on instance attributes inside any method other than the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67051 Test Plan: Added test to test_types.py. Reviewers: Zhengxu Chen Subscribers: Zhengxu Chen, Yanan Cao, Peng Wu, Yining Lu Tasks: T103941984 Tags: pytorch Fixes https://github.com/pytorch/pytorch/issues/52306 Reviewed By: zhxchen17 Differential Revision: D31843527 Pulled By: andrewor14 fbshipit-source-id: 624879ae801621e367c59228be8b0581ecd30ef4	2021-10-25 16:20:47 -07:00
Nikolay Korovaiko	1f55dd83ac	[WIP] wrap XLATensors into Python XLA wrapper class (#65841 ) Summary: Improbably fixes https://github.com/pytorch/pytorch/issues/65130 ezyang I'm super n00b in Python extensions, is this what we want to do? Pull Request resolved: https://github.com/pytorch/pytorch/pull/65841 Reviewed By: navahgar Differential Revision: D31889790 Pulled By: Krovatkin fbshipit-source-id: c7f077b89f6f02df1962ab83d9e13fcc348a227d	2021-10-25 16:11:03 -07:00
Jane Xu	fa7fb7b4d9	[skip ci] Set test owner for test_profiler.py (#66831 ) Summary: Followup action to https://github.com/pytorch/pytorch/issues/66232 cc ilia-cher robieta chaekit gdankel bitfort ngimel orionr nbcsm guotuofeng guyang3532 gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/66831 Reviewed By: gdankel Differential Revision: D31909245 Pulled By: janeyx99 fbshipit-source-id: 4156a5cffa215c29022fc4dab6ee5b442a509db4	2021-10-25 15:59:52 -07:00
Stephen Jia	0acc21b412	[vulkan] Add 2D transposed convolutions (#67104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67104 Add 2D transposed convolutions to Vulkan. Currently, only `dilation={1,1}` is supported. We plan to support dilation at a later time. Test Plan: Build and run `vulkan_api_test`: ``` cd ~/pytorch BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: beback4u Differential Revision: D31731742 fbshipit-source-id: b79c946c8d988bb4d83e9fd3381992a4f2f4be80	2021-10-25 15:55:20 -07:00
Zhengxu Chen	059ae96007	[jit] Factor findAllNodes into one place. (#65965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65965 ghstack-source-id: 141504185 Test Plan: no behavior change Reviewed By: qihqi, ejguan Differential Revision: D31326152 fbshipit-source-id: 2e0261a96853bfb67a96dd68972c905b6b26d562	2021-10-25 15:42:52 -07:00
Shiyan Deng	239b38268b	[fx2trt] Better trt layer name (#67200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67200 We want to put more information on the tensorrt layer name. Mainly we want to be able to tell the original op that a TensorRT layer is mapped from. The layer format is `[TensorRT Layer Type]-[Original Op Code]-[FX Node Name]` ``` Reformatting CopyNode for Input Tensor 0 to [FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.0328ms [FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.027712ms PWN([RELU]-[acc_ops.relu]-[relu_1]): 0.008672ms ``` Test Plan: CI ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2:fx2trt_example ``` Reviewed By: wushirong Differential Revision: D31627274 fbshipit-source-id: 3dbb576caa63b922274541d2a306b4bd37e707c5	2021-10-25 15:41:38 -07:00
Jerry Zhang	4ac8d06911	[quant] Remove unused print in quantization_patterns.py (#67191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67191 Test Plan: sandcastle and ossci Imported from OSS Reviewed By: supriyar Differential Revision: D31899784 fbshipit-source-id: 31ad63c0b2a5328fff80c38dc4e527e0399e802e	2021-10-25 15:07:18 -07:00
Zhengxu Chen	12daa4f663	[jit][edge] Enable CALL instruction in lite interpreter. (#65964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964 ghstack-source-id: 141425519 Test Plan: buck run xplat/caffe2:test_lite_interpreter Reviewed By: cccclai Differential Revision: D31326149 fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547	2021-10-25 14:44:33 -07:00
Xiang Gao	b8dfb45ac2	Refactor cub namespace handling (#66219 ) Summary: This PR is to update PyTorch with the following cub changes: - Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added. And I do the following change to PyTorch: - Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag. - Fix caffe2 failures caused by the above change. - Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219 Reviewed By: bdhirsh Differential Revision: D31626931 Pulled By: ngimel fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d	2021-10-25 14:37:09 -07:00
Ivan Yashchuk	700b39a3df	Sparse CSR CUDA: add `torch.addmm` with all inputs sparse (#63511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511 This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31809838 Pulled By: cpuhrsch fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5	2021-10-25 14:32:30 -07:00
Pearu Peterson	333717eaf0	Improve assert failure message in test_get_torch_func_signature_exhaustive (#67039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67039 cc mruberry Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31899719 Pulled By: cpuhrsch fbshipit-source-id: 819d07da5b18b31d462010b9f9382e0b8cd10f9f	2021-10-25 14:20:38 -07:00
Jacob Szwejbka	a6d0339492	[Pytorch Edge] Extend runtime compatibility to custom classes (#66972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66972 Add api to view how many custom classes we have and what their names are Test Plan: unit test Reviewed By: cccclai Differential Revision: D31811337 fbshipit-source-id: 9f8ca1fc578a0a5360c9cd8f95475acc33f250e4	2021-10-25 13:42:26 -07:00
lezcano	f4dd88489a	Better and more consistent error messages in torch.linalg (#62734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62734 Following https://github.com/pytorch/pytorch/pull/62715#discussion_r682610788 - squareCheckInputs takes a string with the name of the function - We reuse more functions when checking the inputs The state of the errors in torch.linalg is far from great though. We leave a more comprehensive clean-up for the future. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823230 Pulled By: mruberry fbshipit-source-id: eccd531f10d590eb5f9d04a957b7cdcb31c72ea4	2021-10-25 13:24:28 -07:00
Zhengxu Chen	4dce051cb0	[jit][edge] Add control stack frame to lite interpreter (#65963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65963 ghstack-source-id: 141425517 Test Plan: In next diff. Reviewed By: qihqi, cccclai Differential Revision: D31326150 fbshipit-source-id: dbbf65f2bf14846c45d0add71edc7d4dbfc6b92c	2021-10-25 12:15:16 -07:00
Alex Zhao	ac948f4f35	.github: Migrate linux-xenial-py3.6-gcc7 to GHA (#67072 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66888 cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/67072 Reviewed By: seemethere Differential Revision: D31900833 Pulled By: zhaoalex fbshipit-source-id: 93f8995611169d991f90e07e8c13e08182969577	2021-10-25 11:40:12 -07:00
Sahan Chanuka Paliskara	9de0888891	Move the registration of CPython builtin modules to BuiltinRegistry (#67085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67085 leverages BuiltinRegistry to register the CPython standard C modules. The standard C modules moved are in the FOR_EACH macro Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy/interpreter:test_builtin_registry buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy Reviewed By: shunting314 Differential Revision: D31848547 fbshipit-source-id: 7eb49d222eaaccb2b8ca5c984b05bf54cc233f25	2021-10-25 11:12:07 -07:00
Nikita Shulga	d68bb50ef3	Disable SVE when cross-compiling for M1 (#67114 ) Summary: Followup after https://github.com/pytorch/pytorch/issues/58653 It does not matter whether one compiles locally or cross-compiles - attempts to use SVE on M1 results in compiler crash as SVE ABI is not defined on MacOS Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67114 Reviewed By: VitalyFedyunin Differential Revision: D31869356 Pulled By: malfet fbshipit-source-id: 184e26ae40edc7ef7b703200b53ea7a15da74818	2021-10-25 11:03:00 -07:00
Mike Iovine	5d9ff8f30e	[Static Runtime] Add static_runtime::fused_sigrid_transforms (#66659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66659 Original message: We added and registered a new operator, static_runtime::fused_sigrid_transforms, and modified the original sigrid_transforms to handle non-fused case only Note: this diff was commandeered from a bootcamper. Some final touches were needed. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: swolchok Differential Revision: D31550307 fbshipit-source-id: 287380be0cca20ee6e145bcc7217547bd58cf6d0	2021-10-25 10:44:46 -07:00
Pavithran Ramachandran	8d164a36fb	Use `at::native::is_nonzero` in promoted ops to improve portability (#67097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67097 all delegated models have `is_nonzero` ops by default, by making the op native and consumable without dispatch eases the portability of such models ghstack-source-id: 141375082 Test Plan: `buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite` ``` ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test:jit -- test_trace_arange Parsing buck files: finished in 0.5 sec Building: finished in 9.4 sec (100%) 16035/16035 jobs, 0/16035 updated Total time: 10.0 sec More details at https://www.internalfb.com/intern/buck/build/1e55eea5-2adb-41d1-96ae-cbf4b446d6c6 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 46eedba2-ae17-4e88-b205-93bd1332665d Trace available for this run at /tmp/tpx-20211015-113905.235421/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177 ✓ ListingSuccess: caffe2/test:jit - main (12.372) ✓ Pass: caffe2/test:jit - test_trace_arange (jit.test_tracer.TestTracer) (13.748) ✓ Pass: caffe2/test:jit - test_trace_arange_with_grad (jit.test_tracer.TestTracer) (13.892) Summary Pass: 2 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177 ``` Reviewed By: iseeyuan Differential Revision: D31656842 fbshipit-source-id: c0e6c798478a2783c0e17e6e9100ba5ce044da78	2021-10-25 10:18:31 -07:00
Christopher Gray Howard	acb340de75	[Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671 Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141442350 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1Rd44 Reviewed By: albanD Differential Revision: D31673503 fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464	2021-10-25 10:13:21 -07:00
Mike Iovine	a0495b3cdb	[SR] Remove unused operator() overload (#67001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67001 The overload of `operator()` taking `std::vector<at::Tensor>` was only used for testing. In a diff following this one, I will add a new overload that takes `std::vector<c10::IValue> args` and no `kwargs` so we can avoid default-constructing `kwargs` everywhere. This new overload will probably take a forwarding reference, so to avoid problems with overloading on forwarding reference and simplify the interface, it's best to remove this unused one. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` `buck test caffe2/test:static_runtime` Reviewed By: hlu1 Differential Revision: D31821990 fbshipit-source-id: 6d2e4a75ca4abe6e262651532eb96c3b274c6f4a	2021-10-25 08:18:58 -07:00
Mike Iovine	364645cd9d	[SR] Factor operator() implementation into separate function (#67125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67125 Using explicit template instantiations in D31659973 (`f2582a59d0`) was a bad idea. The problem is that the lvalue instantiation was for a `const` vector of `IValue`, meaning that if you tried to pass SR a non-const vector of arguments, the linker would fail to find the symbol. The reason we didn't catch this in D31659973 (`f2582a59d0`) was because predictor always passes a `const` reference anyways. But we should fix this to prevent unexpected problems in the future. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D31873406 fbshipit-source-id: 5ab5a03334bed925cec11facadcedf9bec9b90ad	2021-10-25 08:17:40 -07:00
Sameer Deshmukh	edd4d246c3	Accept 0-dim channel inputs in convolution layer (#66256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56998 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256 Reviewed By: mrshenli Differential Revision: D31859428 Pulled By: jbschlosser fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd	2021-10-25 08:12:29 -07:00
kshitij12345	6c985b57ff	OpInfo : nn.functional.embedding (#66997 ) Summary: Adds OpInfo for `nn.functional.embedding` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66997 Reviewed By: mrshenli Differential Revision: D31859799 Pulled By: zou3519 fbshipit-source-id: bbca860df4fbc243751f5fa81658231866c31d2e	2021-10-25 08:06:32 -07:00
Jerry Zhang	adc21f1966	[quant] Fix docs build (#67169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67169 Looks like the doc error only appears after it's landed Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31890431 fbshipit-source-id: d40cba082712c4b35704ea15d82fbc4749f85aec	2021-10-25 08:02:26 -07:00
Mike Iovine	dd81fa9027	[JIT] Freeze allows preservation of submodule attributes (#66102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66102 This changes allows the `preserved_attributes` parameter of `torch.jit.freeze` to accept attributes of submodules. Previously, only root-level attributes were able to be preserved. Example: ``` class SubModule(nn.Module): def __init__(self): super(SubModule, self).__init__() self.a = 1 self.b = 2 def forward(self): return self.a + self.b class Module(nn.Module): def __init__(self): super(Module, self).__init__() self.sub = SubModule() def forward(self): return self.sub() mod = torch.jit.script(Module()) mod.eval() frozen_mod = torch.jit.freeze(mod, preserved_attrs = ['sub.a']) mod.sub # OK mod.sub.a # OK mod.sub.b # Error, not preserved mod() # = 3 mod.sub.a = 0 mod() # = 2 ``` Test Plan: `buck test caffe2/test:jit -- TestFreezing` Reviewed By: eellison Differential Revision: D31383868 fbshipit-source-id: 34a05ca9528d4e5f04f71ac2a339fd584a8fa305	2021-10-25 07:56:20 -07:00
Jane Xu	09c7771e9c	Set test owners for jit tests (#66808 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808 Reviewed By: mrshenli Differential Revision: D31761414 Pulled By: janeyx99 fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d	2021-10-25 07:51:10 -07:00
Jerry Zhang	364c4959c3	[quant] Fix docs error in convert_fx (#67152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67152 Test Plan: ``` cd docs make html ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31884570 fbshipit-source-id: 2b521f617c93f6fa08da3387df2d25497293eee6	2021-10-24 19:26:45 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Facebook Community Bot	f1b5f1898b	Automated submodule update: kineto (#67133 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `879a203d9b` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67133 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D31877172 fbshipit-source-id: 224a499607d1f3bf7c00d8d8dd1fdac47cd33a3b	2021-10-24 13:06:19 -07:00
Rohan Varma	b51731527d	[ez] [Docs] Missing import in example for post_local_sgd (#67047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67047 Fix missing import ghstack-source-id: 141258423 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31841837 fbshipit-source-id: 139e614517dcac7a53259ff7a0360bb5275bb53b	2021-10-24 01:44:06 -07:00
Rohan Varma	0000c88e10	[FSDP] No need for list() in _get_shard (#66957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66957 chunk appears to return a tuple which is enough given that we just index to the right chunk and discard the rest. ghstack-source-id: 141391149 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31780799 fbshipit-source-id: fdb1b77fffa916328e14a4cd692b5241ae46a514	2021-10-24 01:29:19 -07:00
Rohan Varma	580efb35a5	[FSDP] Add some comments after reading the code. (#66956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66956 Adds some comments I found helpful while ramping up on FSDP code. ghstack-source-id: 141391150 Test Plan: n/a Reviewed By: mrshenli Differential Revision: D31780798 fbshipit-source-id: e2d38a9801b4548b202a73615774d5f0f7f5e3ed	2021-10-24 01:28:19 -07:00
Natalia Gimelshein	b6fa998892	Revert D31514095: Use kernel_func_name from aotCompiler Test Plan: revert-hammer Differential Revision: D31514095 (`7b55dc8340`) Original commit changeset: b70c8e2c7336 fbshipit-source-id: ad4d828f33506e612b51c276149fa0e12b0565d5	2021-10-23 17:17:53 -07:00
Jerry Zhang	313939c9c6	[quant] Fix lint errors (#67138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67138 Test Plan: ossci Imported from OSS Reviewed By: supriyar Differential Revision: D31879558 fbshipit-source-id: 271905d3d254c906aa78bae9f2bd411f9d57e1e8	2021-10-23 11:26:25 -07:00
Priya Ramani	7b55dc8340	Use kernel_func_name from aotCompiler (#66337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337 Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated. This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31514095 Pulled By: priyaramani fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5	2021-10-23 02:20:45 -07:00
Jianyu Huang	64c68edaf3	[pt] Add Half precision support for bucketize and searchsorted op (#67077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67077 Test Plan: CI Reviewed By: yinghai Differential Revision: D31852556 fbshipit-source-id: 1e4212146ee67edc6b6568d25db79de525782788	2021-10-22 23:37:37 -07:00
Jerry Zhang	2d81d5ab0a	[quant][graphmode][fx] Remove fbgemm_backend_config_dict for now (#67066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67066 We'll add it later when the api is ready Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849079 fbshipit-source-id: 0c00d08510166b2d897cf1562c7276527319b05c	2021-10-22 21:57:56 -07:00
Supriya Rao	8460fa5707	[quant][fx] Add an option in convert_fx to accept qconfig_dict to skip quantization (#66878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66878 Currently convert_fx quantizes all layers that have been prepared, depending on the prepare qconfig_dict This PR adds support to accept a variation of qconfig_dict in convert_fx that can be used to specify skip quantizing certain layers This can help with prepare/observe all operators, quantize a subset of them (based on quantization error), to avoid preparing multiple times. The qconfig_dict passed to convert_fx can only have the values set to `None`, with the keys being the same as what is allowed in the prepare qconfig_dict Test Plan: python test/test_quantization.py TestQuantizeFx.test_convert_qconfig_dict Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31808247 fbshipit-source-id: a4f5dca1090f0083fc3fea14aff56924033eb24f	2021-10-22 21:18:15 -07:00
Supriya Rao	d13829e6be	[quant][[fx] update observer_fqn to not depend on node.name (#66767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767 Make observer fqn in prepare step independent of input_node/observed_node name. This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique. Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: anjali411 Differential Revision: D31752052 fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d	2021-10-22 21:16:24 -07:00
Yukio Siraichi	83f70db95c	Fix common device computation for comparison ops. (#66245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66245 Fixes #66053 This PR splits `declare_static_dtype_and_device` into two new methods for `TensorIteratorBase`: `declare_static_dtype` and `declare_static_device`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503849 Pulled By: ngimel fbshipit-source-id: 4b131b691d29ceb5f3709f5d6503997ea0875c54	2021-10-22 18:43:17 -07:00
Jerry Zhang	3f5adf4f9c	[quant][graphmode][fx] Use the new convert function instead of the old one in quant-fx2trt tests (#67065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67065 Switching to use _convert_fx_do_not_use in the tests Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849077 fbshipit-source-id: 3688fc09ac538b6abc16ce87c600b8ee04acfcd1	2021-10-22 18:29:58 -07:00
Deyu Fu	af1a2df825	enable better depthwise conv perf on cudnn 8.2+ (#58749 ) Summary: There are multiple improvement of depthwise convolution speed in cudnn between 7.6 and 8.2, since https://github.com/pytorch/pytorch/pull/22302. This PR aim to harvest all the new improvement by enable more cudnn kernel. The workload checking logic can also be simplified now. To keep the change simple, I kept things before cudnn 8.2 unchanged. Similar to https://github.com/pytorch/pytorch/pull/22302, I used a script [here](https://gist.github.com/FDecaYed/e8ba98a95cd33697df2ace86fdb44897) to benchmark. Both run are using cudnn 8.2 One enhancement I did to the script is switch to event based timing. With warmup kernels to fill the launch queue ahead, this should give us accurate kernel timing even in CPU launch bound cases. Here is A100 and V100 result sorted by speedup. [Book1.xlsx](https://github.com/pytorch/pytorch/files/6530371/Book1.xlsx) Result highlights: Newly turned on 5x5 cudnn kernel show up to 6x speedup. Close to half of test sizes show >10% speedup. Fixed some corner cases that previously caused 15-20x slowdown. Only slowdown a handful of cases(~10 out of >1000) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58749 Reviewed By: bdhirsh Differential Revision: D31613199 Pulled By: ngimel fbshipit-source-id: 883b58facad67ccd51dc9ab539368b4738d40398	2021-10-22 17:47:07 -07:00
Wanchao Liang	cf3a5160f8	[BE] move init_multigpu_helper to common_distributed (#67050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67050 This PR moves init_multi_gpu_helper to common_distributed so that it could be shared by different distributed tests. ghstack-source-id: 141370119 Test Plan: wait for ci. Reviewed By: mrshenli Differential Revision: D31842644 fbshipit-source-id: c7bad25d6cef9bdce7ad1fb6c60c1cad4b765702	2021-10-22 17:16:11 -07:00
Yanli Zhao	df3f82a1ef	Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrapper (#66904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66904 Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrappe, these unit tests are refactored to be aligned with PyTorch commonly used test classes ghstack-source-id: 141335614 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D31779565 fbshipit-source-id: c727110d1d7570c0ec49e42cadfc9e9a5e440073	2021-10-22 16:50:52 -07:00
Michael Suo	f6c88fa99d	Revert D31627107: [BE] delete frontend.cpp Test Plan: revert-hammer Differential Revision: D31627107 Original commit changeset: 07d30d280c25 fbshipit-source-id: 5e82f2158f5007c67adb8f947f8cc4d995a9a3bc	2021-10-22 16:39:02 -07:00
Michael Suo	f50bf16c04	Revert D31663043: [BE] minor improvement to dist quantization Test Plan: revert-hammer Differential Revision: D31663043 Original commit changeset: 2f96b7346e9c fbshipit-source-id: d38684dfe79ca335fbbe624496ad4c86c29d1570	2021-10-22 16:37:41 -07:00
Nikita Shulga	7b0408684b	Fix linter (#67122 ) Summary: Fixes regression introduced by `7e5aa0d35a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67122 Reviewed By: seemethere Differential Revision: D31872569 Pulled By: malfet fbshipit-source-id: ada0137db9a46cbec573489c9c37a94f3a7576ae	2021-10-22 16:02:36 -07:00
Aliaksandr Ivanou	018e06edca	[torchelastic] Skip tests in tsan mode (#67103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67103 Skip tests in tsan mode for now. More info: T104010063 Test Plan: sandcastle + running tests in mode/dev-tsan Reviewed By: d4l3k Differential Revision: D31861426 fbshipit-source-id: d50e5d06afbc82ccce6d102e52f72b5b01f6f41a	2021-10-22 15:55:18 -07:00
samdow	7e5aa0d35a	fixed unique arguments documentation (#66132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66132 Differential Revisi <img width="875" alt="Screen Shot 2021-10-05 at 12 10 39 PM" src="https://user-images.githubusercontent.com/17888388/136276286-3df20681-7b7a-4a91-97d6-4f1ac3722121.png"> on: [D31397746](https://our.intern.facebook.com/intern/diff/D31397746/) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31734476 Pulled By: samdow fbshipit-source-id: 8999443c7f9b24394d7543652b8350261c1f8b3a	2021-10-22 14:50:02 -07:00
Jerry Zhang	a7bbf8814c	[quant][graphmode][fx] Move quant-fx2trt unittests to test_quantize_fx.py (#67064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67064 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849075 fbshipit-source-id: 9c5e8aad7c88070830d853faf3106491726e77ff	2021-10-22 14:36:36 -07:00
Wanchao Liang	7379d4db20	[BE] minor improvement to dist quantization (#66649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66649 some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup ghstack-source-id: 141336191 Test Plan: wait for ci Reviewed By: cbalioglu Differential Revision: D31663043 fbshipit-source-id: 2f96b7346e9c90df5ab2536767f8301eb86a9c79	2021-10-22 13:46:28 -07:00
BowenBao	1da628bdb7	[ONNX] Update slice process shape to support rank only inference (#65782 ) (#66149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149 Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31423840 Pulled By: malfet fbshipit-source-id: 17b2b24aa63435d5212ebe6bdf66ae3c348c4e3b Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:46:26 -07:00
Nikita Shulga	0bc9928f31	[ONNX] Symbolic: dynamic input for OneHot, bool for Einsum (#65940 ) (#66147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66147 Symbolic: dynamic input for OneHot, bool for Einsum Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424094 fbshipit-source-id: 76bea22b29c93d1621c597fe7ab59deb3685087f Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:24 -07:00
Nikita Shulga	2c0fe338da	[ONNX] Modify softplus symbolic to support beta!=1 (#65001 ) (#66146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66146 * Modify softplus symbolic to support beta!=1 * Remove parse args Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424096 fbshipit-source-id: 971af54a28141737ccb17510ada03b0651be2a63	2021-10-22 13:46:22 -07:00
Nikita Shulga	6f3f302d9f	[ONNX] Deprecate fold_if pass (#65697 ) (#66145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145 Deprecate fold_if pass Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424097 fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:20 -07:00
Nikita Shulga	a0fc14c20f	[ONNX] Add diagonal symbolic (#64454 ) (#66144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66144 * Add logic and tests * minor edits * Eliminate expand ops * Fix flake and editing * Modified errant message * Add overrun check * Add overrun descriptions * Remove emptyline Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424095 fbshipit-source-id: 5b8ef6ac21c32d43c3dbc8e51e1ef30bffb19c25	2021-10-22 13:46:18 -07:00
Nikita Shulga	b18c298f24	ONNX: Delete or document skipped ORT tests (#64470 ) (#66143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66143 Delete test_list_remove. There's no point in testing conversion of this model since TorchScript doesn't support it. Add a link to an issue tracking test_embedding_bag_dynamic_input. [ONNX] fix docs (#65379) Mainly fix the sphinx build by inserting empty before bulleted lists. Also some minor improvements: Remove superfluous descriptions of deprecated and ignored args. The user doesn't need to know anything other than that they are deprecated and ignored. Fix custom_opsets description. Make indentation of Raises section consistent with Args section. [ONNX] publicize func for discovering unconvertible ops (#65285) * [ONNX] Provide public function to discover all unconvertible ATen ops This can be more productive than finding and fixing a single issue at a time. * [ONNX] Reorganize test_utility_funs Move common functionality into a base class that doesn't define any tests. Add a new test for opset-independent tests. This lets us avoid running the tests repeatedly for each opset. Use simple inheritance rather than the `type()` built-in. It's more readable. * [ONNX] Use TestCase assertions rather than `assert` This provides better error messages. * [ONNX] Use double quotes consistently. [ONNX] Fix code block formatting in doc (#65421) Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424093 fbshipit-source-id: 4ced841cc546db8548dede60b54b07df9bb4e36e	2021-10-22 13:46:16 -07:00
Nikita Shulga	7a78f715a6	[ONNX] Add warning for inplace updates on tensor.shape in tracing mode (#63170 ) (#66142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66142 * Add warning * Lint and clang fixes * Remove duplicate comments * Added pitfalls section * Modify sections * Minor modifications * Add underline to avoid doc build failures Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424092 fbshipit-source-id: c83195f3c66885ad1aecde13b3029c45dd171dbd	2021-10-22 13:46:14 -07:00
Nikita Shulga	136abf5aff	[ONNX] Update sum symbolic to handle dtypes (#64289 ) (#66141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66141 * Update aten::sum symbolic for dtype * Remove nesting and modify opeartor tests * Fix expect files [ONNX] Fix expect files added in #64289 (#65356) Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424091 fbshipit-source-id: d4af21e9f0d7e1c68bf6ef2f3e385db84b4c53f3	2021-10-22 13:46:12 -07:00
Nikita Shulga	53a163a015	[ONNX] Export nn.Module call as ONNX local function (#63589 ) (#66140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140 * Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model. * Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class. * ~~Contains changes from #63268~~ * Depends on #63716 to update onnx submodule. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424098 fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:44:56 -07:00
Wanchao Liang	d1986a1cf5	[BE] delete frontend.cpp (#66581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66581 c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy. ghstack-source-id: 141336190 Test Plan: wait for ci Reviewed By: rohan-varma Differential Revision: D31627107 fbshipit-source-id: 07d30d280c25502a222a74c2c65dfa4069ed8713	2021-10-22 13:33:24 -07:00
Jerry Zhang	e8742f15cf	[quant][graphmode][fx] Add observation_type.py (#67063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67063 Adding ObservationType Enum for `backend_config_dict` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849078 fbshipit-source-id: e9e7225d564b51fa9454f7f087dd134152c069a0	2021-10-22 12:17:54 -07:00
Mike Iovine	f2582a59d0	[SR] Add rvalue overload for operator() (#66648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66648 Currently, SR shallow-copies its `IValue` inputs when running inferences. We can avoid refcount bumps by `std::move`-ing the inputs into their slots. To achieve this, I've made the following changes: 1. Add an overload for `set_inputs` that takes a `std::vector<IValue>&&`. 2. Change the signatures of `StaticModule::operator()` and `StaticRuntime::operator()`. Old: ``` operator()(const std::vector<IValue>& args, const std::unordered_map<std::string, IValue>& kwargs) ``` New: ``` template <class IValueList> operator()(IValueList&& args, const std::unordered_map<std::string, IValue>& kwargs) ``` The implementations use perfect forwarding to invoke the correct overload of `set_inputs`. Test Plan: Added a short new unit test to exercise the new code path. All other unit tests still pass. Reviewed By: hlu1 Differential Revision: D31659973 fbshipit-source-id: b8c194405b54a5af1b418f8edaa1dd29a061deed	2021-10-22 10:51:47 -07:00
Aditya Pillai	40a8a50913	Add static_runtime::fused_equally_split (#2 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66881 Adds `static_runtime::fused_equally_split` operator and removes `is_fused` logic from original operator. Modifies `FuseUnpackListV2` to map `fb::equally_split` to this new operator. Test Plan: ``` adityapillai@5960 /data/sandcastle/boxes/fbsource/fbcode 1m 13s ❯ buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` and sandcastle strange_what_could_go_wrong Reviewed By: mikeiovine Differential Revision: D31742293 fbshipit-source-id: 60b35589c8817719b005d49811f575b6590d1c39	2021-10-22 10:26:49 -07:00
Mike Iovine	391eb1dbe3	[JIT] UseVariadicOp handles multiple lists (#66288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66288 This change makes it so `UseVariadicOp` can transform ops with many Tensor list inputs. Input pattern: ``` %output : Type = op(%list_1, %arg_1, %list_2, %list_3) ``` Output pattern: ``` %output : Type = variadic_op(%list_11, ..., %list_1N, %arg_1, %list_21, ..., %list_2M, %list_31, ..., %list_3K, N, M, K) ``` The length of each list is passed at the end of the variadic op so that the op implementation can process the inputs appropriately. This also frees us from needing to update `hasVarArgs` in static runtime each time we add a variadic op. This diff also makes `UseVariadicOp` more robust. Before, `list_idx` was passed as an argument. Now, `VariadicUpdater` determines `list_idx` from the node's schema. Test Plan: Existing variadic ops do not break: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31450811 fbshipit-source-id: 808fcc3ae8940b9e602586f38f8cf9154c9a6462	2021-10-22 10:22:33 -07:00
mattip	c7121ae77f	fix formatting CIRCLE_TAG when building docs (#67026 ) Summary: Similar to pytorch/text#1416 malfet, brianjo The previous code failed when tags changed from `v0.9.0` to `v0.10.0`. I tested this offline, it would be nice to somehow be actually tag the repo and see that this adds the correct documentation directory to the pytorch/pytorch.github.io repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67026 Reviewed By: saketh-are Differential Revision: D31843381 Pulled By: malfet fbshipit-source-id: 21526ad9ed4c1751c2d7f6d621da305f166a7f55	2021-10-22 10:10:52 -07:00
Eddie Yan	d9c4b3feab	Do rowwisemoments computation in `float` for `half` `LayerNorm` (#66920 ) Summary: https://github.com/pytorch/pytorch/issues/66707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920 Reviewed By: mrshenli Differential Revision: D31850612 Pulled By: ngimel fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9	2021-10-22 09:50:42 -07:00
Elias Ellison	6e6ede2e70	[JIT] Re-enable alias sensitive peepholes (#65860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65860 Re-enable peepholes like `x + 0 == x`. These were at one point enabled, and then disabled because they did not properly account for aliasing, and then re-enabled with reconstructing the alias db everytime which is slow - O(n^2). I've added correctness conditions, and I've also made it so that we avoid using stale aliasing properties for either the input or output of nodes we optimize. Some of the other code that we have written to avoid re-instantiating the alias db involves internally mutating it, however this is tricky to reason about and we probably have to add some extra invariants... cc navahgar relevant to graph opts and d1jang alias analysis relevant here Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31352382 Pulled By: eellison fbshipit-source-id: 441a27f17dc623d6c24538d1d43cba0412c3c482	2021-10-22 09:45:57 -07:00
Don Jang	051ea5ccbf	[Static Runtime] Bundle function & function_kind to carry them together (#66974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66974 `D31591785 (`67e003f09b`)` started carrying a function object to be executed and `FunctionKind` for the type of the function separately, and this caused a bug fixed by D31783028 (`79803b199f`). This change bundles them as it was before done by swolchok to reduce the chances of such a mistake in the future. They need to be carried altogether always since `FunctionKind` identifies the type of the function object. Note that `struct Function` is a POD type, so accessing its field (first, second) shouldn't cause an extra overhead in `ProcessedNode::run()`. Test Plan: Confirmed that the managed memory metics remain the same before/after this diff on inline_cvr: ``` #AFTER # inline_cvr/local Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) # inline_cvr/local_ro Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2679 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%) # inline_cvr/remote_ro First iter time: 12.0344 ms Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ``` ``` #BEFORE # inline_cvr/local Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) #inline_cvr/local_ro Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2679 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%) #inline_cvr_remote_ro Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ``` Reviewed By: mikeiovine Differential Revision: D31798419 fbshipit-source-id: fd4301b6731e402be0820729654735c791511aba	2021-10-22 08:57:49 -07:00
Erjia Guan	3d7a344c5e	Fix ArchiveReader to keep archive path (#67035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67035 Incorporate the same change from https://github.com/pytorch/data/pull/73 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31837963 Pulled By: ejguan fbshipit-source-id: 3b0171ba30f392c8773c497702bc60aa4fbe28c6	2021-10-22 06:34:39 -07:00
Natalia Gimelshein	d1a5612a3e	remove accscalar from i0 and i0e (#67048 ) Summary: Removes some of the half math ops to make https://github.com/pytorch/pytorch/issues/64023 possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67048 Reviewed By: mruberry Differential Revision: D31847249 Pulled By: ngimel fbshipit-source-id: 8385aacd846bb990e368ff336eb346d847af70b9	2021-10-22 01:34:36 -07:00
Chen Lai	5f58764d1d	[PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130 Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format: ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` For example: ``` "__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[ NamedTuple, [ [float_features, Tensor], [id_list_features, List[Tensor]], [label, Tensor], [weight, Tensor], ] ]" ``` For nested types, the order of type lists from type table should be: ``` std::string type_1 = “__torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ]” std::string type_2 = “__torch__.B [ NamedTuple, [ [field_name_b, __torch__.C ] ] ]” std::string type_3 = “__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B] ] ]” std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3}; std::vector<TypePtr> type_ptrs = c10::parseType(type_strs); ``` namedtuple from both `collection` and `typing` are supported ``` from typing import NamedTuple from collections import namedtuple ``` This change only adds the parser and now new runtime can read the above format. ghstack-source-id: 141293658 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType' ``` Reviewed By: iseeyuan Differential Revision: D30261547 fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5	2021-10-22 00:40:57 -07:00
lezcano	d3fc3c4ded	Implement forward AD for linalg.matrix_exp (#62716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62716 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823231 Pulled By: mruberry fbshipit-source-id: 6d19b8988dce773b5716f0522d06febfe167fead	2021-10-21 23:55:36 -07:00
Han Qi	fe102b9888	diff tool (#66854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66854 diff tool and script to test correctness of flatbuffer format Test Plan: `./verify_flatbuffer.sh \| pastry` P463163180 Reviewed By: zhxchen17 Differential Revision: D31752696 fbshipit-source-id: bea00102b21e62c02367853c8bec2742b483fbda	2021-10-21 22:53:51 -07:00
Jerry Zhang	8ea985f240	[quant][fx][graphmode] Rename files and functions for convert and add do_not_use suffix (#66955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66955 The new convert function are not meant to be used by users, it's a temporary function that we use to build up the new convert path, we will bring feature parity with the old path and deprecate the old path after that Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31810488 fbshipit-source-id: 2f65a110506683123350e619c48df090a15570fc	2021-10-21 22:17:28 -07:00
Hanton Yang	01ced45217	[iOS] Bump up iOS CocoaPods version to 1.10.0 (#67058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67058 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D31846445 Pulled By: hanton fbshipit-source-id: 7510a6c15fdeecc996fcce5c48db32e148ba7def	2021-10-21 21:30:24 -07:00
Nikita Shulga	77beccaedb	Do not build PyTorch with caffe2 by default (#66658 ) Summary: CAFFE2 has been deprecated for a while, but still included in every PyTorch build. We should stop building it by default, although CI should still validate that caffe2 code is buildable. Build even fewer dependencies when compiling mobile builds without Caffe2 Introduce `TEST_CAFFE2` in torch.common.utils Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc` is code is compiled without Caffe2 Should be landed after https://github.com/pytorch/builder/pull/864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D31669156 Pulled By: malfet fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d	2021-10-21 20:32:47 -07:00
Horace He	4fe8055b9f	made functorch not decompose by default (#66945 ) Summary: Basically reverting this: https://github.com/pytorch/pytorch/pull/63616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66945 Reviewed By: zou3519 Differential Revision: D31802176 Pulled By: Chillee fbshipit-source-id: b1cabd7af66aef26411801516c87336eaea4fccb	2021-10-21 19:18:00 -07:00
vfdev-5	28fac23409	Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39227 Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227) Description: - Expands values for CUDA implementation - Improved shapes checking for CUDA - Improved error message for CUDA - Added tests cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790 Reviewed By: mruberry Differential Revision: D31843566 Pulled By: ngimel fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b	2021-10-21 19:09:57 -07:00
Bo Wang	35965869cf	Enroll bowangbj@ to PyTorch distributed package (#67062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67062 For cc and potential reviews Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31849050 fbshipit-source-id: d3899c2ca857b8f22bdc88b4e83cdd20bbf0b1d6	2021-10-21 18:45:21 -07:00
Natalia Gimelshein	20f08d23a0	Revert D31838513: Strided masked reduction: mean. Test Plan: revert-hammer Differential Revision: D31838513 Original commit changeset: 54b99ccf9821 fbshipit-source-id: 5480e8482c8770b41579ee085e158572b659c1f5	2021-10-21 18:32:42 -07:00
Jane Xu	2578de4851	[skip ci] Set test owner for test_cuda* tests (#66838 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/66838 Reviewed By: saketh-are Differential Revision: D31841411 Pulled By: janeyx99 fbshipit-source-id: 5cdffdef4a92f9adcef1143ae4598b052c5acc6b	2021-10-21 17:36:25 -07:00
Pearu Peterson	b40a940192	Strided masked reduction: mean. (#66784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66784 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31838513 Pulled By: cpuhrsch fbshipit-source-id: 54b99ccf9821832c31976406379939b3c95f41de	2021-10-21 16:32:45 -07:00
imaginary-person	b696d64ef4	Binaries without AVX512 kernels shouldn't report CPU Capability as AVX512 on machines with AVX512 support (#66703 ) Summary: ### BUG If a PyTorch binary is built with a compiler that doesn't support all the AVX512 intrinsics in the codebase, then it won't have ATen AVX512 kernels, but at runtime, CPU capability would still be incorrectly returned as AVX512 on a machine that supports AVX512. It seems that PyTorch Linux releases are done on CentOS with `gcc 7.3`, so this bug would manifest in the 1.10 release, unless a fix such as this one is added. gcc versions below 9.0 don't support all the AVX512 intrinsics in the codebase, such as `_mm512_set_epi16`. ### FIX CPU Capability would be returned as AVX512 at runtime only if the binary was built with a compiler that supports all the AVX512 intrinsics in the codebase, and if the hardware the binary is being run on supports all the required AVX512 instruction sets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66703 Reviewed By: gchanan Differential Revision: D31732625 Pulled By: malfet fbshipit-source-id: e52d06b87fbe2af9b303a2e9c264189c8512d5ec	2021-10-21 16:17:28 -07:00
Saketh Are	33790c4e06	Implement histogramdd on CPU (#65318 ) Summary: Implements `torch.histogramdd` analogous to `numpy.histogramdd`. Builds on https://github.com/pytorch/pytorch/pull/58780, generalizing the existing `torch.histogram` kernel to handle D-dimensional inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65318 Reviewed By: soulitzer Differential Revision: D31654555 Pulled By: saketh-are fbshipit-source-id: 14b781fac0fd3698b052dbd6f0fda46e50d4c5f1	2021-10-21 16:09:31 -07:00
Jane Xu	6a224b3370	Set test owners for quantization tests (#66832 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/66832 Reviewed By: saketh-are Differential Revision: D31842880 Pulled By: janeyx99 fbshipit-source-id: 8aee760e4203045c12e7548a21ed5b71c557e3ee	2021-10-21 16:04:41 -07:00
Natalia Gimelshein	f29e5220a6	Revert D31474901: [pytorch][PR] [numpy] add torch.argwhere Test Plan: revert-hammer Differential Revision: D31474901 Original commit changeset: 335327a4986f fbshipit-source-id: 534093e459762ff7a888c58d76e49e362015f2ba	2021-10-21 15:50:54 -07:00
Richard Barnes	fcfa06586d	Wextra fix for NamedTensor.cpp (#66897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66897 Fixes: ``` stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (order_idx >= ellipsis_idx) { ~~~~~~~~~ ^ ~~~~~~~~~~~~ stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (order_idx >= ellipsis_idx) { ~~~~~~~~~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31774623 fbshipit-source-id: b6e5b76695e512084ac5c9cb4215de7e9b763cf8	2021-10-21 14:22:38 -07:00
kshitij12345	462f333c01	[numpy] add torch.argwhere (#64257 ) Summary: Adds `torch.argwhere` as an alias to `torch.nonzero` Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`. From NumPy docs, > np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257 Reviewed By: dagitses Differential Revision: D31474901 Pulled By: saketh-are fbshipit-source-id: 335327a4986fa327da74e1fb8624cc1e56959c70	2021-10-21 14:02:11 -07:00
soulitzer	892ac08a02	Do not generate not_implemented error for forward AD when input with tangent passed to non-differentiable function (#66926 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61926 1. update the `if` to just use requires_derivative since that should reflect when function is not differentiable 2. if `requires_derivative=True` but no outputs have forward derivatives, we should error as usual 3. ~In the future we may also want to handle the case~ when `len(fw_derivatives) > 0 and len(fw_derivatives) < num_diff_outputs` we should add assert in codegen that this does not happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66926 Reviewed By: anjali411 Differential Revision: D31810736 Pulled By: soulitzer fbshipit-source-id: 11a14477cc7554f576cff2ed1711a448a8c6a66a	2021-10-21 13:53:07 -07:00
Facebook Community Bot	062ae8df0e	Automated submodule update: tensorpipe (#65353 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `183172ba8c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65353 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D31059779 fbshipit-source-id: 7bddff5139f8168750e22e1cc8c0d49931db542e	2021-10-21 13:30:45 -07:00
Jane Xu	b07371f19c	[skip ci] Set test owners for serialization tests (#66862 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66862 Reviewed By: saketh-are Differential Revision: D31828615 Pulled By: janeyx99 fbshipit-source-id: 8d28970eead9d6f26e9ea64b823295d9c9e1469d	2021-10-21 13:22:18 -07:00
Jane Xu	6f1ba16d6d	[skip ci] Set test owners for cpp test (#66836 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc yf225 glaringlee Pull Request resolved: https://github.com/pytorch/pytorch/pull/66836 Reviewed By: saketh-are Differential Revision: D31828641 Pulled By: janeyx99 fbshipit-source-id: 076d41686746fecebc07452df8212eef15a7824c	2021-10-21 13:17:46 -07:00
Jane Xu	00a871c5c9	[skip ci] Set test owner for multiprocessing tests (#66848 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66848 Reviewed By: VitalyFedyunin Differential Revision: D31828908 Pulled By: janeyx99 fbshipit-source-id: 45d6901648f5564c1bf07ad8d01d69ef486ae104	2021-10-21 13:13:53 -07:00
Jane Xu	78f970568c	Add dummy op to use instead of searchsorted (#66964 ) Summary: Would help unblock https://github.com/pytorch/pytorch/issues/66818 if this actually works Pull Request resolved: https://github.com/pytorch/pytorch/pull/66964 Reviewed By: mruberry Differential Revision: D31817942 Pulled By: janeyx99 fbshipit-source-id: 9e9a2bcb0c0479ec7000ab8760a2e64bf0e85e95	2021-10-21 12:56:22 -07:00
Kurt Mohler	94f4e9a995	Enable warning tests for nondeterministic backward functions (#66736 ) Summary: Followup from https://github.com/pytorch/pytorch/issues/66233 Since https://github.com/pytorch/pytorch/issues/50209 was fixed, we can enable these warning tests now cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66736 Reviewed By: zou3519 Differential Revision: D31723385 Pulled By: mruberry fbshipit-source-id: dc1922a6d0c45cc80020db85710e755a89113861	2021-10-21 12:51:53 -07:00
Shen Li	ce6f4b3a02	Setup c10d extension Backend class attr the same way as builtin ones (#66991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66991 Currently, c10d extensions uses Backend.NAME to store the creator function. However, builtin ones use that same field to store the name. This commit makes c10d extensions comply with builtin ones, and uses a dedicated `_plugins` field to store creator functions. Thanks bryanmr for pointing this out. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31820307 Pulled By: mrshenli fbshipit-source-id: 259769ebfc80c0c9fc44d25498c8d19a3a09d1bc	2021-10-21 12:35:07 -07:00
Saketh Are	40e5d31a52	Add OpInfo for torch.bincount (#65796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65796 Reviewed By: bdhirsh Differential Revision: D31386560 Pulled By: saketh-are fbshipit-source-id: acb6ed3f743ddcccd0ff7ce1ab21f77c2078da37	2021-10-21 12:11:38 -07:00
Tal Ben-Nun	9d4549295d	ONNX export: propagate node metadata across passes (#45256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45255 Mostly straightforward. Only downside in this PR is the lack of more scalable way to check for all newly-created nodes in `callPySymbolicFunction`. The other options were: * Create a scope within the node's scope and loop through all nodes that correspond to the scope. The code would still need to loop through all nodes. * Add extra state to the graph (no good reason to do so). * Add extra state to the ONNX exporter, since python calls go back to `g.op(...)` (no good reason to do so, also not very pythonic). cc BowenBao neginraoof Pull Request resolved: https://github.com/pytorch/pytorch/pull/45256 Reviewed By: malfet, houseroad Differential Revision: D31744281 Pulled By: msaroufim fbshipit-source-id: 1b63f6e7f02ed61b3a9b7ac3d0be0a3a203c8ff6	2021-10-21 11:49:05 -07:00
Michael Suo	a33f341cee	[ci] try setting MAX_JOBS on windows builds to reduce OOMs (#66986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66986 See: https://github.com/pytorch/pytorch/issues/66674 Test Plan: Imported from OSS Reviewed By: seemethere, anjali411 Differential Revision: D31822578 Pulled By: suo fbshipit-source-id: e24bbe9a1ff21ad0653708217cef5d8b2f56c5a2	2021-10-21 11:41:05 -07:00
Mike Iovine	53cf7e844f	[SR] Fix bug in FuseListUnpackV2 (#67021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67021 When applying the equally split optimization, we still need to delete the list unpack node. I did an accuracy test yesterday but didn't catch this issue because my diffs were not properly synced between devservers (I use hlu1's devbig for testing and it had an old version of "Add FuseListUnpackV2"). But I did another test this morning and realized that there was an issue. This is not affecting anything in prod right now since D31742293 has not landed. Reviewed By: hlu1 Differential Revision: D31827278 fbshipit-source-id: c7b05e3d8ec942632adcff4bdfebb8c27c1a7a39	2021-10-21 11:08:04 -07:00
Samuel Salas	a7ec4b53d2	Splitter: Transformer_encoder (#66952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66952 Added splitter to lower parts of the transformer model Program now supports arg input Test Plan: Performance on non-lowered model: 0.19662559509277344 Performance on semi-lowered model: 0.19131642150878905 Reviewed By: 842974287 Differential Revision: D31541325 fbshipit-source-id: 194aba97afc794dbeada4bbc4777d0a7b02e3635	2021-10-21 10:59:08 -07:00
Samuel Salas	d73b88b473	Unsqueeze bug fix (#66889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66889 Added support for negative dims and modified unit test. Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_unsqueeze Reviewed By: 842974287 Differential Revision: D31769393 fbshipit-source-id: 854335ead2ffad5f466ad66b9be36ba20a0fea67	2021-10-21 10:57:58 -07:00
Pearu Peterson	23321ba7a3	Fix bug [#66780 ]: wrong input to torch.is_floating_point (#66783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66783 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31802971 Pulled By: cpuhrsch fbshipit-source-id: 6a7d8b83dad219fd683504f9084b77358800507c	2021-10-21 09:50:58 -07:00
Jane Xu	13b8599831	[skip ci] Set test owner for test_dispatch.py (#66840 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66840 Reviewed By: saketh-are Differential Revision: D31829224 Pulled By: janeyx99 fbshipit-source-id: 66aceacd4f976c36ed48ca5be59616d245ba2a82	2021-10-21 08:48:37 -07:00
John Shen	8cbdf49dce	[qnnpack] Remove conv_utils.h (#66605 ) Summary: This completes the removal of conv_utils and redistributes its dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/66605 ghstack-source-id: 140565820 Test Plan: ci tests Reviewed By: kimishpatel Differential Revision: D31637731 fbshipit-source-id: 48d3a423e4ff0eb6ab21bb13bda44da16996423b	2021-10-21 08:23:42 -07:00
Jane Xu	960e3216a4	[skip ci] Set test owner for named tensor tests (#66849 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66849 Reviewed By: zou3519 Differential Revision: D31828903 Pulled By: janeyx99 fbshipit-source-id: 30810bcec750ba8e1d5a342c31a5996bf57acd69	2021-10-21 08:22:26 -07:00
Jane Xu	f5c5ab2868	[skip ci] Set test owner for cpp-extensions tests (#66837 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc yf225 glaringlee zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66837 Reviewed By: anjali411 Differential Revision: D31828401 Pulled By: janeyx99 fbshipit-source-id: 35ac27f3e1c0eb70ccb38c07c42ba61bd0c848fe	2021-10-21 08:15:38 -07:00
arindamroy-eng	32e790997b	[Rocm]Reduce severity of detected possible memory leak from assertion to warning (#65973 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62533. In very rare cases, the decorator for detecting memory leak is throwing assertion, even when the test is passing, and the memory is being freed with a tiny delay. The issue is not being reproduced in internal testing, but shows up sometimes in CI environment. Reducing the severity of such detection to warning, so as not to fail the CI tests, as the actual test is not failing, rather only the check inside the decorator is failing. Limiting the change to ROCM only for now. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65973 Reviewed By: anjali411 Differential Revision: D31776154 Pulled By: malfet fbshipit-source-id: 432199fca17669648463c4177c62adb553cacefd	2021-10-21 07:10:54 -07:00
Jagadish Krishnamoorthy	70a5113e03	[ROCm] update Magma for 4.3 release (#65203 ) Summary: Upstream magma fixes the cholesky issues. Refer https://bitbucket.org/icl/magma/issues/48/parameter-4-was-incorrect-on-entry-to Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Fixes #{issue number} cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65203 Reviewed By: anjali411 Differential Revision: D31766608 Pulled By: malfet fbshipit-source-id: 3829b89314d25d8aa14be57ead879a811ab3f098	2021-10-21 07:06:01 -07:00
Bo Wang	b6df043f1f	Add torch.nn.init.uniform_ operator to ShardedTensor. (#63997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997 Use torch_function to extend torch.nn.init.uniform_ The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed. Also enable unit test for test_linear.py for OSS test Test Plan: a) Unit Test (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op) or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit# Imported from OSS Reviewed By: pritamdamania87, anjali411 Differential Revision: D30563017 fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1	2021-10-21 00:17:13 -07:00
Bert Maher	bdb889aca1	[nnc] Use a descriptive name for fused kernels when profiling (#66990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66990 NNC fusion groups currently show up as "TensorExpr" in the profiler, which is true but not super useful since it obscures what's actually happening in the fusion group. This change will log them as `fused_XXX` where XXX is a (length-limited) series of ops describing the subgraph, for instance `fused_mul_add` to represent a group containing `aten::mul`, `aten::add`. Test Plan: New unit test to check the output of autograd profiler. Reviewed By: dzhulgakov Differential Revision: D31762087 fbshipit-source-id: 3fadbdc67b054faa01aa42e5b6ea2c4a6bc3481f	2021-10-21 00:06:23 -07:00
Pavithran Ramachandran	8beabffac3	[PyTorchEdge] Make aten function common to aten and torch_common (#66663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66663 fb: TensorCompare.cpp is in per-app, a target higher than torch_mobile Please read this doc to know about [Per-app ATen/native and Template Selective Build]( https://docs.google.com/document/d/1O5--mOAi_gGh2GkE-REo3qJRRQ_Lks69IfgszcB8ThI/edit) Create a filed called "prim_native_functions.cpp" in ATen, add it to aten_cpu, and cut-paste native::is_nonzero() to prim_native_functions.cpp. By doing this we move the function to lower layer which is more visible to all targets depending on it. Instruction count comparison new vs old https://www.internalfb.com/phabricator/paste/view/P463272302?view=diff Test Plan: fb: ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_cpu Building: finished in 0.4 sec (100%) 1/202 jobs, 0/202 updated Total time: 0.4 sec More details at https://www.internalfb.com/intern/buck/build/ea35300b-55be-4b9f-bc74-80cdd869c16a BUILD SUCCEEDED (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_native_cpu Building: finished in 0.7 sec (100%) 1/1 jobs, 0/1 updated Total time: 0.8 sec More details at https://www.internalfb.com/intern/buck/build/ccd97d43-c59d-4f29-9418-485cd24575e2 BUILD SUCCEEDED ``` Reviewed By: iseeyuan Differential Revision: D31669536 fbshipit-source-id: d35f069f975db6dce0b678c5b5ddd74bd690f599	2021-10-20 20:41:41 -07:00
Jerry Zhang	f8f04d5424	[quant][graphmode][fx] Add support for single linear and conv2d (#66950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66950 Just to show that it works for weighted operations as well, qat/fused op not supported yet We can start developing the backend_config_dict and work towards making the support more complete afterwards Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31801782 fbshipit-source-id: 8491bab7939a7a1c23ffa87c351844b82e390027	2021-10-20 19:13:27 -07:00
Jerry Zhang	a89851a0d9	[quant][fx][graphmode] Adding a new convert function that produces reference pattern by default (#66925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66925 Current convert_fx implementation is using "The Interpreter Pattern" in https://pytorch.org/docs/stable/fx.html There are two things that's changed which make the approach in this PR possible and needed: 1). original convert implementation is developed at the initial prototype where fx does not allow mutations, now fx supports mutations 2). original convert needs to work for a lot of fbgemm/qnnpack specific logic, which is not needed for reference patterns Therefore it makes sense for us to write a new convert function just for reference patterns, the implementation is significantly easier to understand than the original convert implementation Current support: * we should be able to support all non-weighted ops like relu, add etc. Missing: * linear and conv * some advanced features like standalone modules, input_quantized_idxs etc. will add linear and conv support and start defining the backend_config_dict based on this version of convert Test Plan: python test/test_quantization.py TestQuantizeFxOpsNew Imported from OSS Reviewed By: vkuzo Differential Revision: D31786241 fbshipit-source-id: 2a32156eb6d3c5271cb44906cd863055785fb5d4	2021-10-20 18:54:30 -07:00
Nanshu Wang	db4165892b	[SmartCompose][OnDevice]fix function name bug in mobile export & Script to convert mobile model (#66915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66915 Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/3 fix function name bug in mobile export Test Plan: buck run pytext/fb/assistant/smart_compose:mobile_converter -- --model_input=pytext_training/tree/teams/assistant/smart_compose/300555761/model.ts --model_output=pytext_training/tree/teams/assistant/smart_compose/300555761/mobile_model_test.ts Reviewed By: JacobSzwejbka Differential Revision: D31782983 fbshipit-source-id: 7288bb65adc7346d218980a535d68a12d8ef2033	2021-10-20 18:14:51 -07:00
Mike Iovine	ab1e4eac42	[Static Runtime] Add FuseListUnpackV2 (#66509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66509 Like `FuseListUnpack`, but instead of adding arguments to the fused node's outputs, inserts a new fused op. By using a new fused op, we can avoid runtime `is_fused` checks. This will make the op implementations significantly cleaner. Eventually, we will migrate all ops to `V2` and delete to old pass. `FuseListUnpackV2` also fixes the bug described in T103159043. Test Plan: I've made some changes to D31550307 locally and verified that everything works. Reviewed By: hlu1 Differential Revision: D31492017 fbshipit-source-id: 4f90fcbc17e4c70a3d65985bee836fabf868a22c	2021-10-20 16:39:32 -07:00
Elias Ellison	17889ad26e	Add support for cat in output stitching (#66098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098 `cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize. Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797467 Pulled By: eellison fbshipit-source-id: ca761e214dfd7f3bba8d189f3b3f42ffec064f63	2021-10-20 16:13:09 -07:00
Elias Ellison	2dd23ebfdb	Add support for multi output nodes in partial eval graph stitching (#66097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097 Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797468 Pulled By: eellison fbshipit-source-id: 2c182b71a46b36d33f23ad35b89790a4a5d4471c	2021-10-20 16:13:07 -07:00
Elias Ellison	0196b984f3	Add Handling of Cat in Shape Analysis (#65575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575 This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes. The graph of with two dynamic shape inputs produces: ``` graph(%x.1 : Tensor(SS(-2), 2, 3), %y.1 : Tensor(SS(-3), 2, 3)): %5 : int = prim::Constant[value=0]() %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1) %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19 return (%6) ``` With a partial eval graph of ``` Done with partial evaluation graph(%129 : int[], %130 : int[], %dim.14 : int): %738 : int = prim::Constant[value=3]() %737 : int = prim::Constant[value=2]() %132 : int = prim::Constant[value=0]() %392 : int = aten::__getitem__(%129, %132) # <string>:339:44 %417 : int = aten::__getitem__(%130, %132) # <string>:339:44 %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29 %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738) return (%result_size.5) ``` To handle cat, I essentially make the cat shape op variadic, replacing ``` torch.cat([x, y] ... def cat_shape_op(tensors: List[List[int]], dim: int): ... op(tensors) ``` with ``` def cat_shape_op(x: List[int], y: List[int], dim: int): tensors = [x, y] op(tensors) ``` This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797471 Pulled By: eellison fbshipit-source-id: 62c794533d5fabfd3fad056d7e5fe3e8781b22c5	2021-10-20 16:13:05 -07:00
Elias Ellison	eaba976d49	Add x + 0 optimization (#65574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797470 Pulled By: eellison fbshipit-source-id: bf9309fb43f164665335fed0d09697b0e2f67261	2021-10-20 16:13:03 -07:00
Elias Ellison	b059f035be	Fix bug preventing optimization from firing (#65573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573 When we remove mutation on ``` x = [0, 1, 3, 4] x[-2] = 4 ``` we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797469 Pulled By: eellison fbshipit-source-id: 02a1686b9f6016eb5aeb87ed342c043c203dcd0e	2021-10-20 16:13:01 -07:00
Elias Ellison	63b41e1f4d	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797472 Pulled By: eellison fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed	2021-10-20 16:12:58 -07:00
Elias Ellison	4ad6c144f6	[JIT][Easy] Shape cleanups (#65148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148 No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797447 Pulled By: eellison fbshipit-source-id: 60d322da040245dd7b47ee7c8996239572fd11c2	2021-10-20 16:11:24 -07:00
andrewor	e046386be8	Avoid inlining error reporting in checked_convert (#66721 ) Summary: Summary: Move the error reporting part to the cpp file to avoid callers inlining it, which inflates the generated code size. See https://github.com/pytorch/pytorch/issues/65830. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66721 Test Plan: Compiling the simple program below now generates ~150 lines of assembly, compared to 700+ lines before. ``` #include <c10/core/Scalar.h> void g(float) {} void f(const c10::Scalar& scalar) { auto x = scalar.to<float>(); g(x); } ``` Reviewers: Brian Hirsh Subscribers: Brian Hirsh, Edward Yang, Yining Lu Tasks: T103384490 Tags: pytorch Fixes https://github.com/pytorch/pytorch/issues/65830 Reviewed By: zou3519, bdhirsh Differential Revision: D31737607 Pulled By: andrewor14 fbshipit-source-id: 3d493c4d8e51d8f8a19d00f59b8ea28176c8a9e3	2021-10-20 16:04:09 -07:00
Don Jang	18bbc4c2b7	[Static Runtime] Fix a bug in aten::index (#66940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66940 `aten::index`'s schema is as follows: ``` "aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor ``` The current implementation assumes `indices`' elements are all tensors by doing `elem.toTensor`, which is incorrectly. This change creates an empty optional value if an element from `indices` is not a tensor. Test Plan: Fixed `StaticRuntime, IndividualOps_Index` to correctly test `aten::index` with `indices` that contains `None`. Reviewed By: hlu1 Differential Revision: D31712145 fbshipit-source-id: be1c29674bcd55b67b0dcc2a988bc37fd43745f3	2021-10-20 15:51:21 -07:00
Junjie Wang	08cb31a03e	[PyTorch][1/N] Basic implementation of ShardedEmbedding using ShardedTensor. (#66604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604 This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor. Several caveats: 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. ghstack-source-id: 141056130 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31544556 fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f	2021-10-20 15:16:49 -07:00
liulixinkerry	257239972c	Fix attr_to_scope's key in `torch/utils/tensorboard/_pytorch_graph.py` (#65692 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65692 Reviewed By: Reubend Differential Revision: D31678606 Pulled By: edward-io fbshipit-source-id: 7c0bf740ee4f8c21bd01ced3ae70df23c9efadfb	2021-10-20 14:35:29 -07:00
Ivan Yashchuk	450221c534	Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510 Sparse CSR matrix resizing behavior: If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`. If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rowscols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rowscols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`. If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes. We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31796680 Pulled By: cpuhrsch fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401	2021-10-20 14:19:04 -07:00
Sahan Paliskara	f56a1a59a3	Add simple backwards compatibility check for torch.package (#66739 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65154, tests for backwards compatibility of torch.package by checking if packages that were created before can still be loaded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66739 Reviewed By: suo Differential Revision: D31771526 Pulled By: PaliC fbshipit-source-id: ba8c652c647b94114a058e4c7d7f1c7ce6033d84	2021-10-20 12:46:17 -07:00
Jane Xu	6e67150f57	[skip ci] Set test owner for test_mkldnn.py (#66845 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc gujinghui PenghuiCheng XiaobingSuper jianyuh VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66845 Reviewed By: anjali411 Differential Revision: D31803377 Pulled By: janeyx99 fbshipit-source-id: 4fcf77d3e4bf976449a0b1ab4d750619db3493a1	2021-10-20 12:38:56 -07:00
Mikayla Gawarecki	5569d5824c	Fix documentation of arguments for torch.nn.functional.Linear (#66884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66884 Addressing docs fix mentioned in issue 64978 on Github ghstack-source-id: 141093449 Test Plan: https://pxl.cl/1Rxkz Reviewed By: anjali411 Differential Revision: D31767303 fbshipit-source-id: f1ca10fed5bb768749bce3ddc240bbce1dfb3f84	2021-10-20 12:02:58 -07:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Jane Xu	cf77bd4cf4	Fix python version in test tools CI job (#66947 ) Summary: On the HUD, the test tools job is failing as the runners now install Python 3.10, which is not compatible with numpy 1.20 See https://github.com/pytorch/pytorch/runs/3952169950?check_suite_focus=true Install dependencies step: ``` ERROR: Command errored out with exit status 1: command: /opt/hostedtoolcache/Python/3.10.0/x64/bin/python /opt/hostedtoolcache/Python/3.10.0/x64/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptq8aay7m cwd: /tmp/pip-install-dk_6t98q/numpy_e9431bf106b746148c0e7c36e46551b4 Complete output (1169 lines): setup.py:66: RuntimeWarning: NumPy 1.20.0 may not yet support Python 3.10. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66947 Reviewed By: suo, malfet Differential Revision: D31799205 Pulled By: janeyx99 fbshipit-source-id: 64bf10c37c0aa4f5837c48e92d56e81d920722bd	2021-10-20 10:12:16 -07:00
Jane Xu	793f366e34	[skip ci] Set test owners for sparse tests (#66863 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863 Reviewed By: anjali411 Differential Revision: D31771126 Pulled By: janeyx99 fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e	2021-10-20 10:12:13 -07:00
Pearu Peterson	a015964cf8	Strided masked reduction: prod. (#66386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66386 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31779598 Pulled By: cpuhrsch fbshipit-source-id: 304a3d6abc794a49de5b044aade6cfd727758495	2021-10-20 10:10:54 -07:00
Jane Xu	822277f302	[skip ci] Set test owners for test_type_promotion.py (#66866 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nairbv mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66866 Reviewed By: anjali411 Differential Revision: D31771149 Pulled By: janeyx99 fbshipit-source-id: 87c04ed4a75ada06a553a11064d44ac65fc4c6ea	2021-10-20 09:42:37 -07:00
Jane Xu	409364e597	[skip ci] Set test owners for test_typing.py (#66869 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang malfet rgommers xuzhao9 gramster Pull Request resolved: https://github.com/pytorch/pytorch/pull/66869 Reviewed By: anjali411 Differential Revision: D31766850 Pulled By: janeyx99 fbshipit-source-id: e9772f5378be07162d4f4d06925165e396d7d6c6	2021-10-20 09:41:13 -07:00
Jane Xu	452b359c3f	[skip ci] Set test owners for tensor creation tests (#66864 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66864 Reviewed By: anjali411 Differential Revision: D31771139 Pulled By: janeyx99 fbshipit-source-id: 74adeae7de355fa6c63de22290fa324911230368	2021-10-20 09:38:21 -07:00
Jane Xu	8a65047acc	[skip ci] Set test owners for everything considered with module: tests (#66865 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865 Reviewed By: anjali411 Differential Revision: D31771147 Pulled By: janeyx99 fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89	2021-10-20 09:37:03 -07:00
Jeffrey Wan	94f4b22df9	Revert D31761594: [pytorch][PR] opinfo : nn.functional.embedding Test Plan: revert-hammer Differential Revision: D31761594 (`ed5633d0c5`) Original commit changeset: d24f44728d04 fbshipit-source-id: 72574918300a7982430a0ceb772c9a24de525050	2021-10-20 09:17:16 -07:00
soulitzer	f95fef7897	Add prim::TensorExprDynamicGuard to bc allowlist (#66939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66939 Reviewed By: ejguan Differential Revision: D31797160 Pulled By: soulitzer fbshipit-source-id: 630b7a0ab99671192397f927391361622f7e9c2e	2021-10-20 08:53:19 -07:00
Mikayla Gawarecki	3fe2ff800c	Module docs update (#66909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37824 {F671745341} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66909 Reviewed By: anjali411 Differential Revision: D31782046 Pulled By: mikaylagawarecki fbshipit-source-id: 009d2ea3c8a51a89786ef55bb9e88dc53aa8360f	2021-10-20 08:14:36 -07:00
vfdev	62ca5a81c0	Exposed `recompute_scale_factor` into nn.Upsample (#66419 ) Summary: Description: - Exposed recompute_scale_factor into nn.Upsample such that recompute_scale_factor=True option could be used Context: https://github.com/pytorch/pytorch/pull/64501#discussion_r710205190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66419 Reviewed By: gchanan Differential Revision: D31731276 Pulled By: jbschlosser fbshipit-source-id: 2118489e6f5bc1142f2a64323f4cfd095a9f3c42	2021-10-20 07:59:25 -07:00
Pearu Peterson	867ccc9987	Strided masked reduction: amin. (#66385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66385 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31779530 Pulled By: cpuhrsch fbshipit-source-id: de753c2d191f7980a48831b892d3a1e8a7a547cd	2021-10-20 07:45:40 -07:00
Mikayla Gawarecki	c69e33bb11	Fix doc string for torch.acosh (#66814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66814 Shift equation above note as per issue 65905 on github Test Plan: Imported from OSS In preview docs built from PR https://docs-preview.pytorch.org/66814/generated/torch.acosh.html#torch.acosh equation is now above note {F671441651} Reviewed By: gchanan Differential Revision: D31742677 Pulled By: mikaylagawarecki fbshipit-source-id: 9fa5390ad2a01ca001418c0bd624f2145f861bf4	2021-10-20 07:01:42 -07:00
kshitij12345	ed5633d0c5	opinfo : nn.functional.embedding (#66622 ) Summary: Adds opinfo for `nn.functional.embedding` Few cases where `numerical` gradient doesn't match (gradcheck fails) ```python import torch try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, padding_idx=1), (idx, t, )) except Exception as e: print("PADDING IDX:", e) try: t = torch.ones(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, max_norm=1.), (idx, t, )) except Exception as e: print("MAX NORM:", e) try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, scale_grad_by_freq=True), (idx, t, )) except Exception as e: print("SCALE GRAD BY FREQUENCY:", e) try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, sparse=True), (idx, t, )) except Exception as e: print("SPARSE", e) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66622 Reviewed By: gchanan Differential Revision: D31761594 Pulled By: zou3519 fbshipit-source-id: d24f44728d049e6276d6c3165aa1fba458214959	2021-10-20 06:33:55 -07:00
Hao Lu	79803b199f	[Static Runtime] Make sure ProcessedNode::function_kind_ is copied over (#66917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66917 The total number of 'out' variant nodes/total number of nodes is now 100% for all the models, which isn't true obviously. Reviewed By: swolchok, mikeiovine Differential Revision: D31783028 fbshipit-source-id: e0bc2c6614aa3c3a235283c9125de1b339f42585	2021-10-20 00:21:35 -07:00
Junjie Wang	14ee608791	[PyTorch] Make rearragement in sharded linear work as expected. (#66603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66603 Found the issue here: https://github.com/pytorch/pytorch/issues/66281 by make the test cases more complicated. By closely reading the code again, it turns out my original understanding is also wrong. Let's use the example mentioned in the issue to explain: If the placement is like: ``` "rank:3/cuda:3", "rank:0/cuda:0", "rank:1/cuda:1", "rank:2/cuda:2", ``` First, we split the column or row by the order of [3, 0, 1, 2]. In the case of column-wise sharding: We get to reaggrage the result from rank0-4. Step 1: we split the output based on the original sharding strategy, aka, rank3 gets the 1st shard, rank0 get the 2nd shard, etc. Step 2: we need to rearrange the result from rank0-4 by ordering them following the order of [3, 0, 1, 2], aka, the result from rank3 needs to be put in the front, and so forth. In the case of row-wise sharding: We need to rearrange the input being sent to rank0-4. Step 1: we reorder the input and follow the map of [3, 0, 1, 2]. For example, the first shard goes to rank 3 so we need to put in the 3rd part, the second shard goes to rank 0, so we put it in the 2nd part, and so on. Step 2: the size of the sharding for each rank is decided by the original placement: [3, 0, 1, 2], aka, rank 3 gets the first shard and its size, etc. Update the unit test to reflect this change. Also, correct some format and comments in the sharded linear. ghstack-source-id: 141055689 Test Plan: unit test and wait for CI. Reviewed By: pritamdamania87, bowangbj Differential Revision: D31634590 fbshipit-source-id: 677a9c2b42da1e2c63220523ed2c004565bbecc7	2021-10-19 23:16:38 -07:00
Michael Suo	ef15691a1e	Revert D31732421: [JIT][Easy] Shape cleanups Test Plan: revert-hammer Differential Revision: D31732421 (`16d0896b69`) Original commit changeset: e934507d1795 fbshipit-source-id: 6b34815c556de64ee5c7ef8d41e4cb434ccd7098	2021-10-19 20:07:06 -07:00
Michael Suo	70c9eb130d	Revert D31732419: [JIT] Add partial evaluation graph stitching logic Test Plan: revert-hammer Differential Revision: D31732419 (`5db7db667f`) Original commit changeset: 883a55cbeef0 fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f	2021-10-19 20:07:04 -07:00
Michael Suo	90b42452e2	Revert D31732417: Fix bug preventing optimization from firing Test Plan: revert-hammer Differential Revision: D31732417 (`853fc25fb0`) Original commit changeset: dd734254c021 fbshipit-source-id: 3da0663dac5b5d2117b3d7abdbcd45d96f98de33	2021-10-19 20:07:02 -07:00
Michael Suo	b8d58129bb	Revert D31732420: Add x + 0 optimization Test Plan: revert-hammer Differential Revision: D31732420 (`66543f88de`) Original commit changeset: 0271e0dc0dda fbshipit-source-id: c2beea1661e10c2f1a982b5d4a34b1041dcb1204	2021-10-19 20:07:00 -07:00
Michael Suo	e730752610	Revert D31732416: Add Handling of Cat in Shape Analysis Test Plan: revert-hammer Differential Revision: D31732416 (`cc7de1df3b`) Original commit changeset: 6d93ddf62c34 fbshipit-source-id: e2c9713177a7f783897e99dd71e631fb275c37da	2021-10-19 20:06:57 -07:00
Michael Suo	57fcea9e88	Revert D31732418: Add support for multi output nodes in partial eval graph stitching Test Plan: revert-hammer Differential Revision: D31732418 (`0fdc9b77a3`) Original commit changeset: 767698d031b1 fbshipit-source-id: f899eb155dcec67d57f53a658a71169d37b63b42	2021-10-19 20:06:55 -07:00
Michael Suo	4187d870df	Revert D31732415: Add support for cat in output stitching Test Plan: revert-hammer Differential Revision: D31732415 (`b4db5174fe`) Original commit changeset: 7f513cea355f fbshipit-source-id: a0d8f1512b13d51f6e50b5da58084effbaf0a0dc	2021-10-19 20:06:53 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Nikita Shulga	9c4d7d96db	Address feedback from #66673 (#66905 ) Summary: Specify both `build_generates_artifacts` and `exclude_tests` properties as suggested in https://github.com/pytorch/pytorch/pull/66673#pullrequestreview-783667960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66905 Reviewed By: seemethere Differential Revision: D31779742 Pulled By: malfet fbshipit-source-id: 21f5543f3b767f38132be8c7e163455f39ff893f	2021-10-19 18:27:45 -07:00
Alex Beloi	deb6989880	[fx-acc] add optimize_quantization to FX graph opts (#65929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65929 This adds a set of quantize/dequantize graph optimizations. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts ``` ``` Parsing buck files: finished in 0.8 sec Building: finished in 3.0 sec (100%) 8475/80926 jobs, 0/80926 updated Total time: 3.9 sec More details at https://www.internalfb.com/intern/buck/build/9dd6193b-d99c-4d2a-8ef8-4d71380916e7 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: b5a83d2a-8870-400e-b21e-3286967d1f4a Trace available for this run at /tmp/tpx-20211018-165956.836274/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (3.152) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_1_optimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.100) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_0_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.017) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.154) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.140) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_2_QuantizePerChannel_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.422) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.296) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.288) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.433) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_clamp_tensor (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.346) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_1_Quantize_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.403) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_2_unoptimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.117) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.415) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.280) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_3_Dequantize_Quantize_Dequantize_X_Dequantize_rescale_X_Dequantize_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.150) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_6 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.133) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.523) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.569) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_4_Rescale_QuantizeNode_QuantizeNode_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.815) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_5 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.295) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_4 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.308) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.213) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.230) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_0_Dequantize_Quantize_X_X (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.336) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.486) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_7 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.306) Summary Pass: 25 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882 ``` # Before ``` Model before opt. graph(): %x : [#users=1] = placeholder[target=x] %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1.000001e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) %dequantize_1 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.dequantize](args = (), kwargs = {input: %quantize_per_tensor_2}) %quantize_per_tensor_3 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %dequantize_1, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) return quantize_per_tensor_3 opcode name target args kwargs ------------- --------------------- ------------------------------------------------ ------------------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- placeholder x x () {} call_function quantize_per_tensor_2 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1.000001e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} call_function dequantize_1 <function dequantize at 0x7f66030a35e0> () {'input': quantize_per_tensor_2} call_function quantize_per_tensor_3 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': dequantize_1, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} output output output (quantize_per_tensor_3,) {} ``` # After ``` Model after opt. graph(): %x : [#users=1] = placeholder[target=x] %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) return quantize_per_tensor_2 opcode name target args kwargs ------------- --------------------- ------------------------------------------------ ------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- placeholder x x () {} call_function quantize_per_tensor_2 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} output output output (quantize_per_tensor_2,) {} ``` Reviewed By: jfix71 Differential Revision: D30945732 fbshipit-source-id: 427cd4215b546e1d6c5362734bb7de93d0c0b1b9	2021-10-19 17:06:32 -07:00
Jane Xu	32e3003726	Have test classes extend from common_utils.TestCase, not unittest.TestCase (#66900 ) Summary: This causes some functionality to not work, such as the disabling issues e.g., https://github.com/pytorch/pytorch/issues/66641 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66900 Reviewed By: seemethere Differential Revision: D31778293 Pulled By: janeyx99 fbshipit-source-id: df3023ddaf7969ffb60117d1e1d7e36d87bc6139	2021-10-19 16:54:05 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00
Elias Ellison	b4db5174fe	Add support for cat in output stitching (#66098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098 `cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732415 Pulled By: eellison fbshipit-source-id: 7f513cea355f1e4c1d2ca7c32c06690a9bdcb050	2021-10-19 16:41:44 -07:00
Elias Ellison	0fdc9b77a3	Add support for multi output nodes in partial eval graph stitching (#66097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097 Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732418 Pulled By: eellison fbshipit-source-id: 767698d031b1daf002678a025b270e0ede429061	2021-10-19 16:41:39 -07:00
Elias Ellison	cc7de1df3b	Add Handling of Cat in Shape Analysis (#65575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575 This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes. The graph of with two dynamic shape inputs produces: ``` graph(%x.1 : Tensor(SS(-2), 2, 3), %y.1 : Tensor(SS(-3), 2, 3)): %5 : int = prim::Constant[value=0]() %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1) %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19 return (%6) ``` With a partial eval graph of ``` Done with partial evaluation graph(%129 : int[], %130 : int[], %dim.14 : int): %738 : int = prim::Constant[value=3]() %737 : int = prim::Constant[value=2]() %132 : int = prim::Constant[value=0]() %392 : int = aten::__getitem__(%129, %132) # <string>:339:44 %417 : int = aten::__getitem__(%130, %132) # <string>:339:44 %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29 %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738) return (%result_size.5) ``` To handle cat, I essentially make the cat shape op variadic, replacing ``` torch.cat([x, y] ... def cat_shape_op(tensors: List[List[int]], dim: int): ... op(tensors) ``` with ``` def cat_shape_op(x: List[int], y: List[int], dim: int): tensors = [x, y] op(tensors) ``` This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732416 Pulled By: eellison fbshipit-source-id: 6d93ddf62c34846ec238159f75229632515530b7	2021-10-19 16:41:34 -07:00
Elias Ellison	66543f88de	Add x + 0 optimization (#65574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732420 Pulled By: eellison fbshipit-source-id: 0271e0dc0ddab06220048ed5bf4236fc85f3318c	2021-10-19 16:41:29 -07:00
Elias Ellison	853fc25fb0	Fix bug preventing optimization from firing (#65573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573 When we remove mutation on ``` x = [0, 1, 3, 4] x[-2] = 4 ``` we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732417 Pulled By: eellison fbshipit-source-id: dd734254c0212ca459c1c135da262974de5299be	2021-10-19 16:41:24 -07:00
Elias Ellison	5db7db667f	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732419 Pulled By: eellison fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3	2021-10-19 16:41:19 -07:00
Elias Ellison	16d0896b69	[JIT][Easy] Shape cleanups (#65148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148 No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732421 Pulled By: eellison fbshipit-source-id: e934507d1795e0bc4d98a3bfe6cb792e2f08b119	2021-10-19 16:39:32 -07:00
Peter Bell	b3bb234e16	Remove THCGeneral.cpp (#66766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66766 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31721647 Pulled By: ngimel fbshipit-source-id: 5033a2800871c8745a1a92e379c9f97c98af212e	2021-10-19 16:09:19 -07:00
Ivan Yashchuk	bd4d5cb14c	Sparse CSR: Add torch.empty (#63509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63509 The primary use of `torch.empty` is to reserve memory for tensor and set the type, device, size information. The same is done here for SparseCSR. `crow_indices` is initialized as an empty tensor of size `num_rows + 1`. `col_indices` and `values` are initialized as empty tensors of size 0. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31770359 Pulled By: cpuhrsch fbshipit-source-id: c83f2a2e0d7514ba24780add1086e1bccf541dd9	2021-10-19 15:59:07 -07:00
Erjia Guan	b1a6129e09	Add repr to StreamWrapper (#66880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66880 Help to print out `fileobj` Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31764431 Pulled By: ejguan fbshipit-source-id: 668a8fbe0078196d4d584be3dfb413c8ad5e72b1	2021-10-19 15:28:25 -07:00
David Riazati	e70b5d64f4	Change README getting started link to explicit instructions (#66828 ) Summary: This changes the link for installing binaries to the page on pytorch.org that is entirely the download command selector (which isn't visible on a normal aspect ratio screen when the main website page first loads anymore). This also includes some other random fixes: * Update HUD link * Clean ups Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66828 Reviewed By: malfet Differential Revision: D31750654 Pulled By: driazati fbshipit-source-id: aef9ceba71418f6f7648eab9a8c8a78d6c60518b	2021-10-19 14:59:48 -07:00
Nikita Shulga	cbd7bac914	Migrate clang5-mobile build to GHA (#66673 ) Summary: `linux-xenial-py3-clang5-mobile-build`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic` and `linux-xenial-py3-clang5-mobile-code-analysis` are just the flavors of regular linux build job with no tests. `linux-xenial-py3-clang5-mobile-code-analysis` is the master only job `code-analysis` job is dispatch to `.jenkins/pytorch/build-mobile-code-analysis.sh` in `583217fe37/.jenkins/pytorch/build.sh (L23-L25)` and all `mobile-build` jobs are dispatched to `.jenkins/pytorch/build-mobile.sh` in `583217fe37/.jenkins/pytorch/build.sh (L19-L21)` Rename `is_libtorch` `CIWorkflow` property into `build_generates_artifacts` and change defaults from False to True Both libtorch and mobile build jobs do not generate build artifacts Pull Request resolved: https://github.com/pytorch/pytorch/pull/66673 Reviewed By: janeyx99 Differential Revision: D31674434 Pulled By: malfet fbshipit-source-id: 24d05d55366202cd4d9c25ecab429cb8f670ded0	2021-10-19 14:13:29 -07:00
Shiyan Deng	15f21eef5e	[fx2trt]fix softmax test (#66885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66885 Test Plan: CI Reviewed By: hl475 Differential Revision: D31767433 fbshipit-source-id: 1ee79ac027c612b5397be9da9665fff21b2c321f	2021-10-19 13:55:49 -07:00
Richard Barnes	a1afb692f3	Fix metal issues with irange (#66877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66877 Fixes (hopefully): ``` program_source:516:27: error: use of undeclared identifier 'c10' for (const auto idx : c10::irange(4)) { ^ program_source:590:27: error: use of undeclared identifier 'c10' for (const auto idx : c10::irange(4)) { ^ program_source:810:26: error: use of undeclared identifier 'c10' for (const auto iy : c10::irange(roi_bin_grid_h)) { ^ program_source:811:30: error: use of undeclared identifier 'c10' for (const auto ix : c10::irange(roi_bin_grid_w)) { ^ DeviceName: AMD Radeon Pro 5500M, LanguageVersion: 131075 Exception raised from -[MetalContext available] at xplat/caffe2/aten/src/ATen/native/metal/MetalContext.mm:66 (most recent call first): (no backtrace available) ``` Test Plan: Sandcastle Reviewed By: benb, xta0 Differential Revision: D31763270 fbshipit-source-id: cfe4364b14c5fe6dbd39893788919769c9a9eb00	2021-10-19 13:49:24 -07:00
Scott Wolchok	66f241230d	[PyTorch] Take const Type& in {tryS,s}calarTypeFromJitType (#66717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66717 No need to require a refcount bump for this function. ghstack-source-id: 140921170 Test Plan: CI Reviewed By: suo Differential Revision: D31696898 fbshipit-source-id: a3732a04ccbddc32207ce90836030f3020154a77	2021-10-19 13:08:42 -07:00
Jane Xu	9a00910bf3	[skip ci] Set test owner for test_linalg.py (#66844 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/66844 Reviewed By: gchanan Differential Revision: D31761714 Pulled By: janeyx99 fbshipit-source-id: a4c7b239d855707ee6ec1194f57f8a66812b4e99	2021-10-19 13:01:05 -07:00
Shunting Zhang	57c596eb9e	add interactive_embedded_interpreter.cpp to the OSS build (#66352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66352 Add cmake rules for interactive_embedded_interpreter.cpp . The builtin_registry.cpp has already been handled in https://github.com/pytorch/pytorch/pull/66347 . I'll remove the change in this PR once that one is merged. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D31521249 Pulled By: shunting314 fbshipit-source-id: bb9d340e5a6aad7d76078ca03a82b5ae7494a124	2021-10-19 12:32:49 -07:00
Ivan Yashchuk	3488a85a76	Sparse CSR CUDA: fix input checks for `addmm` and `mm` (#66485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66485 The errors for incorrectly sized inputs should match the dense variants of functions. Moved addmm_out_sparse_csr_dense_cuda from SparseCsrTensorMath.cu and removed unnecessary device check. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31764036 Pulled By: cpuhrsch fbshipit-source-id: 76900fe9e4a49474695a01f34bad41cb3422321c	2021-10-19 12:01:11 -07:00
Peter Bell	690c2a7076	masked_scatter: fuse mask count check into one kernel (#66871 ) Summary: This saves 1 kernel launch, 7 dispatcher calls, 3 `TensorImpl` allocations and 1 CUDA memory allocation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66871 Reviewed By: gchanan Differential Revision: D31763713 Pulled By: ngimel fbshipit-source-id: b0d2f9415b7fd013fb4e7d68ade6e38a58f5b153	2021-10-19 11:52:38 -07:00
Scott Wolchok	552af8bdef	[PyTorch] Fix missing move in OptionalType::createWithContained (#66697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66697 We own this vector, so we can move from it. ghstack-source-id: 140742640 Test Plan: CI Reviewed By: suo Differential Revision: D31693230 fbshipit-source-id: 3f33ca6e47e29b0e3d6c8fad59c234c55e1e159f	2021-10-19 11:47:35 -07:00
Scott Wolchok	7e81a89e13	[PyTorch] Fix performance-no-automatic-move clang tidy warnings in matchTypeVariables (#66720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66720 See the documentation for the warning. https://clang.llvm.org/extra/clang-tidy/checks/performance-no-automatic-move.html ghstack-source-id: 140922952 Test Plan: CI Reviewed By: suo Differential Revision: D31697506 fbshipit-source-id: 26ce6c47d0f3b0c4e48ecc882f6792f1b5a45bac	2021-10-19 11:30:46 -07:00
Jane Xu	50f5689d60	Set test owner for distributions tests (#66842 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/66842 Reviewed By: neerajprad Differential Revision: D31761720 Pulled By: janeyx99 fbshipit-source-id: 9d9e88d93e2efb90c971f165b4040880e9d90c56	2021-10-19 11:00:29 -07:00
Jane Xu	c37f413e75	[skip ci] Change pretrained to false for quantization tests (#66795 ) Summary: Helps resolve a bit of https://github.com/pytorch/pytorch/issues/65439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66795 Reviewed By: suo, jerryzh168 Differential Revision: D31732043 Pulled By: janeyx99 fbshipit-source-id: 10b71865fc937f9d72f2b1c04cbf3ea9a68c8818	2021-10-19 10:56:29 -07:00
Jane Xu	c9d9244166	[skip ci] Set test owner for test_spectral_ops.py (#66843 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66843 Reviewed By: gchanan Differential Revision: D31761715 Pulled By: janeyx99 fbshipit-source-id: 1173a200478b87568768fafcfee117c09c1cffbd	2021-10-19 10:56:27 -07:00
Jane Xu	34051d74da	Add test owner to distributed files starting with test_ (#66797 ) Summary: Action based on https://github.com/pytorch/pytorch/issues/66232 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797 Reviewed By: gchanan Differential Revision: D31761389 Pulled By: janeyx99 fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c	2021-10-19 10:55:20 -07:00
Jane Xu	94afbd158c	[skip ci] Set test owner for test_numpy_interop.py (#66851 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/66851 Reviewed By: gchanan Differential Revision: D31761703 Pulled By: janeyx99 fbshipit-source-id: 4dec507dff0ce25d2780b6020f0d9790ab1cb499	2021-10-19 10:50:54 -07:00
Eshika Shah	17f07c310b	Fix type checking errors in torch/ao/quantization/quantize_fx.py (#66804 ) Summary: - [x] Fix the Pyre type checking errors in `torch/ao/quantization/quantize_fx.py` ``` torch/quantization/quantize_fx.py:41:8 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:143:16 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:144:16 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:206:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:230:12 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:268:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:269:8 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:427:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:464:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:486:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:547:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/76](https://github.com/MLH-Fellowship/pyre-check/issues/76) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66804 Reviewed By: onionymous Differential Revision: D31738171 Pulled By: 0xedward fbshipit-source-id: 00d4c5749c469aff39a1531365461ced747e52fc	2021-10-19 09:45:18 -07:00
lezcano	a2e94b80fa	Create linalg.matrix_exp (#62715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62715 Fixes https://github.com/pytorch/pytorch/issues/61648 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31641698 Pulled By: mruberry fbshipit-source-id: 2e2965d14807b6b4fada4b809d539066dd0ba277	2021-10-19 09:07:15 -07:00
Jane Xu	fd608cd313	[skip ci] Set test owners for optim tests (#66861 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/66861 Reviewed By: albanD Differential Revision: D31761369 Pulled By: janeyx99 fbshipit-source-id: 57829e1f1509fc2af321530a4b55c9d33b7fb150	2021-10-19 08:39:35 -07:00
Jane Xu	c806bb1022	[skip ci] Set test owner for test_complex.py (#66835 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/66835 Reviewed By: anjali411 Differential Revision: D31761723 Pulled By: janeyx99 fbshipit-source-id: ca672f5a1be9dc27284fade725a8238cbfd877a3	2021-10-19 08:36:27 -07:00
Jane Xu	299a6a65b2	[skip ci] Set test owners for autograd tests (#66834 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66834 Reviewed By: albanD Differential Revision: D31761778 Pulled By: janeyx99 fbshipit-source-id: 355edfb1b940154e84fbba6f7b096605e75ae459	2021-10-19 08:35:02 -07:00
Jane Xu	39215ddf84	[skip ci] Set test owners for dataloader tests (#66839 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc SsnL VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/66839 Reviewed By: ejguan Differential Revision: D31761722 Pulled By: janeyx99 fbshipit-source-id: 8315ac03352c11b3215d89856b3cfda6cd78fa0c	2021-10-19 08:31:16 -07:00
Jane Xu	9eab6da887	[skip ci] Set test owner for nn tests (#66850 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66850 Reviewed By: albanD Differential Revision: D31761712 Pulled By: janeyx99 fbshipit-source-id: 7272154cac77e2ce38370775a9e8d41252e13166	2021-10-19 08:26:50 -07:00
Janet Yang	05b6dc9d75	Fix BatchMatMul test and shape inference (#66733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66733 Fix the test for BatchMatMul to compare glow/caffe2 outputs and fix its shape inference function since it made simplifying assumptions for broadcasting and failed on some of the shapes in the test. The previous inference was failing for any cases where the first n - 2 output dimensions of A x B was not simply that of whichever one of A or B had higher rank (ex. A: [2, 2, 2, 3, 4], B: [3, 1, 2, 2, 4, 5] we expect output dimensions [3, 2, 2, 2, 3, 5] rather than [3, 1, 2, 2, 3, 5]. Test Plan: ``` buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- -r .test_batch_matmul_manydims. --env USE_INF_API=1 ``` Reviewed By: khabinov Differential Revision: D31701184 fbshipit-source-id: 31d0fb17409a399b90fb8042385e000ed81c3581	2021-10-19 07:53:13 -07:00
Philip Meier	9f782f8b35	add `OpInfo` for `torch.nn.pixel_unshuffle` (#65468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65468 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31111699 Pulled By: zou3519 fbshipit-source-id: a92c2f1f4986a54abab82360e97ea2ce22fb9397	2021-10-19 07:36:35 -07:00
Philip Meier	1164118fc2	add `OpInfo` for `torch.nn.pixel_shuffle` (#65467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65467 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31111697 Pulled By: zou3519 fbshipit-source-id: 618e6b2cc927814f85500374a2838d98c9c45d6e	2021-10-19 07:36:33 -07:00
Philip Meier	8f09292c5e	add `OpInfo` for `torch.nn.functional.pairwise_distance` (#65460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65460 cc albanD mruberry jbschlosser walterddr Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31111701 Pulled By: zou3519 fbshipit-source-id: a4034418cf8d14f584134a16d822181703858f99	2021-10-19 07:35:10 -07:00
Ben Koopman	0036e41143	[quant][embedding qat] Add eager QAT test for EmbeddingBag+Linear model (#66334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66334 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31618283 Pulled By: b-koopman fbshipit-source-id: bb824a341f1aa9d7e83f8e66d320a9dfd348a1d7	2021-10-19 07:03:36 -07:00
Richard Barnes	0a07488ed2	use irange for loops 1 (#66741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66741 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705360 fbshipit-source-id: 7115f76e381ad2d98584eb534961c3cbb957ebaa	2021-10-19 03:28:51 -07:00
Giuseppe Ottaviano	72803dbcfd	[caffe2] Fix invalid vector accesses and polar() call (#66757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757 `InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future. Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++). Test Plan: JIT tests pass. Reviewed By: zhxchen17 Differential Revision: D31715587 fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33	2021-10-19 00:29:54 -07:00
gmagogsfm	147f7559b1	Add `SourceView` which doesn't own source text as base class of `Source` (#65309 ) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a	2021-10-18 23:17:22 -07:00
Rohan Varma	bff64e84cd	[DDP] Track models with sync bn (#66680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680 Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models with sync BN so we can find workflows that use them and target for perf optimization. ghstack-source-id: 140875182 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31679477 fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e	2021-10-18 22:31:52 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Richard Barnes	bceb1db885	use irange for loops 3 (#66747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66747 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705365 fbshipit-source-id: 5c3af2184766b063eed2f4e8feb69f1fedd3503e	2021-10-18 21:50:32 -07:00
Ivan Yashchuk	061baf02bf	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: zou3519 Differential Revision: D31739416 Pulled By: mruberry fbshipit-source-id: 153c40d8eeeb094b06816882a7cbb28c681509a9	2021-10-18 21:30:01 -07:00
Scott Wolchok	08a464a9f3	[PyTorch] Pass c10::optional<bool> to Stride ctor by value (#66698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66698 this type should fit in a register; no need to pass by reference. ghstack-source-id: 140742830 Test Plan: CI Reviewed By: suo Differential Revision: D31693291 fbshipit-source-id: 299fb3d1830a059b59268487c22e030446c3496e	2021-10-18 21:28:56 -07:00
Natalia Gimelshein	c9c52b760b	test addr type promotion in a single test (#66812 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66802 Test time goes from 150s to 15s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66812 Reviewed By: mruberry Differential Revision: D31739299 Pulled By: ngimel fbshipit-source-id: cb6d92ff335f46ee06b2480bdd9143f85865bccf	2021-10-18 21:21:11 -07:00
Will Constable	d05c1ec007	Add lazy Node base and associated infra (#66601 ) Summary: - Adds Node base class and unit tests - Also adds metadata utils to enable source code annotation and scope tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601 Test Plan: Add new unit tests Reviewed By: desertfire Differential Revision: D31634044 fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5	2021-10-18 19:09:42 -07:00
Scott Wolchok	a17a4e93ce	[PyTorch][easy] Fix missing move in UnionType::createWithContained (#66691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66691 Does what it says on the tin. ghstack-source-id: 140736047 Test Plan: CI Reviewed By: suo Differential Revision: D31691627 fbshipit-source-id: 21a5d0248bf3412f5af36260597a5f663ab34361	2021-10-18 18:04:22 -07:00
Scott Wolchok	c9c447f4be	[PyTorch] Fix missing moves in ListType (#66701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66701 We own the argument vector. ghstack-source-id: 140760983 Test Plan: CI Reviewed By: suo Differential Revision: D31693645 fbshipit-source-id: 02829bc3c728f6d1d07be08b0d977eee1efee38f	2021-10-18 18:00:18 -07:00
Scott Wolchok	d0a63c978b	[PyTorch][easy] Don't copy string in TensorType::repr_str unnecessarily (#66699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66699 std::string::operator+ will copy the string an extra time even if the argument is `""`. See https://godbolt.org/z/3sM5h1qTo ghstack-source-id: 140743822 Test Plan: CI Reviewed By: suo Differential Revision: D31693522 fbshipit-source-id: 6a8033c90366904b9aff44214b600cfb255a0809	2021-10-18 17:55:21 -07:00
Scott Wolchok	f65b4b7a4c	[PyTorch] Avoid refcount bump in UnionType::canHoldType (#66693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66693 Passing a `TypePtr` by value causes an unnececssary refcount bump. We don't need to take ownership, so `const Type&` is all we need. I considered providing a compatibility shim that takes `const TypePtr&`, but doing so is dangerous because a copy is required to convert from a more specific pointer like `NoneTypePtr`. ghstack-source-id: 140737081 Test Plan: CI Reviewed By: suo Differential Revision: D31691869 fbshipit-source-id: f766ce3234a28771c2a9ca4c284eb3f96993a3d0	2021-10-18 17:39:59 -07:00
kshitij12345	1db50505d5	[nn] MultiLabelSoftMarginLoss : no batch dim support (#65690 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/65690 Reviewed By: zou3519 Differential Revision: D31731162 Pulled By: jbschlosser fbshipit-source-id: d26f27555f78afdadd49126e0548a8bfda50cc5a	2021-10-18 15:30:01 -07:00
Yanli Zhao	8173d4df69	move get_cycles_per_ms() to common_utils (#66798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66798 get_cycles_per_ms is copied and used in a few places, move it to common_utils so that it can be used as a shared util function ghstack-source-id: 140790599 Test Plan: unit tests Reviewed By: pritamdamania87 Differential Revision: D31706870 fbshipit-source-id: e8dccecb13862646a19aaadd7bad7c8f414fd4ab	2021-10-18 14:04:09 -07:00
Eli Uriegas	d024f1134d	ci: Move bazel download from github -> s3 (#66815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66815 Was seeing 403's when attempting to wget from github, re-hosting the binary on s3 so we shouldn't see those issues anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31740656 Pulled By: seemethere fbshipit-source-id: 4462678d51a52b63020f8da18d7cdc80fb8dbc5d	2021-10-18 13:34:40 -07:00
Jerry Zhang	06e49ea088	[not4land][quant][fx][graphmode] lower reference linear module example (#65723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65723 Example lowering reference linear module to fbgemm/qnnpack quantized linear module Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31567461 fbshipit-source-id: 0b8fffaf8e742ec15cb07bf6a4672cf3e856db2d	2021-10-18 13:14:39 -07:00
Jannik Bamberger	c994a7fc2d	Update documentation of torch.nn.Upsample (#66756 ) Summary: The documentation of torch.nn.Upsample stated that `align_corners` only affects `linear`, `bilinear` and `trilinear`. This PR updates the documentation for the Python `Upsample` module and the C++ `UpsampleOptions` struct to reflect that `bicubic` is also affected by `align_corners`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66756 Reviewed By: zou3519 Differential Revision: D31731148 Pulled By: jbschlosser fbshipit-source-id: 3ec277fc3fbdf8414d0de327d8c57ba07342a5b9	2021-10-18 13:07:17 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Scott Wolchok	44fd312604	[PyTorch] Use intrusive_ptr to save space in KernelFunction (#65618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65618 This saves 8 bytes per KernelFunction, which should help in resource-constrained environments. ghstack-source-id: 140731069 Test Plan: CI Reviewed By: ezyang Differential Revision: D25405736 fbshipit-source-id: 757c0f1387da9147e46ac69af2aa9fffd2998e35	2021-10-18 12:53:45 -07:00
Scott Wolchok	622e19b859	[PyTorch] Take const Type& in TensorType::fromNumberType (#66716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66716 No need to require a refcount bump for this function. ghstack-source-id: 140754065 Test Plan: CI Reviewed By: suo Differential Revision: D31696639 fbshipit-source-id: bf8aa3f542d52e82e0f6a444b8898330f3d16a31	2021-10-18 12:49:40 -07:00
Scott Wolchok	6a7296be9c	[PyTorch] Use castRaw in InterfaceType (#66728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66728 Two extra refcount bumps. ghstack-source-id: 140760872 Test Plan: CI Reviewed By: suo Differential Revision: D31698577 fbshipit-source-id: 1f50195a99f98f857abc9b03b4254519c316fefe	2021-10-18 12:44:24 -07:00
Jane Xu	9ea3424747	Set test owner for fx (#66807 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66807 Reviewed By: jamesr66a Differential Revision: D31736722 Pulled By: janeyx99 fbshipit-source-id: 5ffcb02a858137211bff1eabf158001dcb0359a6	2021-10-18 12:25:38 -07:00
Peter Bell	8637556d23	Migrate THCState to ATen (#66765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66765 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31721648 Pulled By: ngimel fbshipit-source-id: 772b24787656a95f9e3fcb287d912b1c3400f32d	2021-10-18 12:14:43 -07:00
Scott Wolchok	1fcbd8fa15	[PyTorch] Fix extra refcount bumps in tryEvalTypeVariables (#66722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66722 Missing move, s/cast/castRaw/, and take TypePtr arg by const ref because we only sometimes need to take ownership. ghstack-source-id: 140757141 Test Plan: CI Reviewed By: suo Differential Revision: D31697631 fbshipit-source-id: 04afe13688c6e2aaf79157400c0a44021cb8179d	2021-10-18 12:06:37 -07:00
Scott Wolchok	393299b124	[PyTorch] Fix unnecessary shared_ptr copies in RRefType (#66706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66706 Missing moves in the construction path. ghstack-source-id: 140746585 Test Plan: CI Reviewed By: suo Differential Revision: D31694356 fbshipit-source-id: 8e2bf2dd41f3f65fc06e30ffd5fddd487d01aaa8	2021-10-18 12:04:43 -07:00
Scott Wolchok	d5a25faf7a	[PyTorch] Fix unnecessary shared_ptr copies in EnumType (#66714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66714 Forced copy in getValueType and unnecessary use of cast over castRaw. ghstack-source-id: 140752791 Test Plan: CI Reviewed By: suo Differential Revision: D31696164 fbshipit-source-id: fc2316617a61ca32f1fb952fb0af18b8784a606b	2021-10-18 12:04:41 -07:00
Ivan Kobzarev	9b729ebc88	[jit] shape propagation for quantization (#66343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66343 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31515839 Pulled By: IvanKobzarev fbshipit-source-id: 1b2b953b93210a1cade64c30302478907fc639f3	2021-10-18 12:03:20 -07:00
BowenBao	1cf317b85f	[ONNX] Support exporting with Apex O2 (#65374 ) (#66700 ) Summary: Apex O2 hook state_dict to return fp16 weights as fp32. Exporter cannot identify them as same tensors. Since this hook is only used by optimizer, it is safe to remove this hook while exporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66700 Reviewed By: zou3519 Differential Revision: D31695132 Pulled By: malfet fbshipit-source-id: 977bdf57240002498f3ad0f1a8046c352e9860e6	2021-10-18 11:54:09 -07:00
Pritam Damania	624ce95201	Run sparse tests only for TensorPipe agent. (#66661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66661 Similar to https://github.com/pytorch/pytorch/pull/66600, runs rpc_test.py sparse tests only for TP agent. ghstack-source-id: 140666322 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D31669850 fbshipit-source-id: 41a66c8d1843130964aede5c77d391484607214f	2021-10-18 11:53:07 -07:00
Nikita Vedeneev	7fad47e522	`torch.linalg.lstsq`: forward/backward AD support (#65054 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65054 Reviewed By: zou3519 Differential Revision: D31729468 Pulled By: albanD fbshipit-source-id: ab7df824bc80128e7f64f6444c7a4baa4786c161	2021-10-18 11:28:44 -07:00
Scott Wolchok	6bde474066	[PyTorch] Fix extra refcount bumps in matchTypeVariables (#66719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66719 Some cast that could be castRaw. Parameters did not need to force a refcount bump. ghstack-source-id: 140756356 Test Plan: CI Reviewed By: suo Differential Revision: D31697455 fbshipit-source-id: 87a8cba221a7ae53f2a485acafd31622e9328ff0	2021-10-18 11:15:07 -07:00
Scott Wolchok	c373e188d8	[PyTorch] Fix extra refcount bumps in unifyTypes (#66718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66718 Some missing moves and use of cast instead of castRaw (due to a previous automated fixup only being a partial fix). ghstack-source-id: 140755229 Test Plan: CI Reviewed By: suo Differential Revision: D31697115 fbshipit-source-id: 86743f8982951a58638ba244b3a92d3737dde58b	2021-10-18 11:13:45 -07:00
Pearu Peterson	472a6f2787	Strided masked reductions: sum, amax. Testing of masked reductions. (#65990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65990 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31729532 Pulled By: albanD fbshipit-source-id: 855a6bb2a7c6e75c780a64ce23c0f29321f0e511	2021-10-18 11:10:32 -07:00
Jerry Zhang	d777e490a5	[bc-breaking][quant][graphmode][fx] Produce reference patterns for GeneralShapeOps (#66647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66647 Missed in the last round , This adds reference patterns for general shape ops like view when is_reference is True bc-breaking: basically disabled getitem from supporting quantized ops here, we may support it later in fbgemm Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Imported from OSS Reviewed By: H-Huang Differential Revision: D31680379 fbshipit-source-id: 6a3a7128514baf6d92b1607308c40339469d0066	2021-10-18 11:09:17 -07:00
Scott Wolchok	eb1eefc399	[PyTorch] Fix unnecessary shared_ptr copies in DictType (#66702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66702 Missing moves in the construction path and forced copies of the key & value type on access. ghstack-source-id: 140744707 Test Plan: CI Reviewed By: suo Differential Revision: D31693818 fbshipit-source-id: 4c5d2359f58148744621abe81429e56e7889f754	2021-10-18 11:05:25 -07:00
Scott Wolchok	09c4e73c95	[PyTorch] Fix unnecessary shared_ptr copies in FutureType (#66704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66704 Missing moves in the construction path. ghstack-source-id: 140746391 Test Plan: CI Reviewed By: suo Differential Revision: D31694296 fbshipit-source-id: 3bed477c811069248611efdb57ad27c6ca233442	2021-10-18 11:01:00 -07:00
Stas Bekman	62e89f692f	[doc] typo (#66754 ) Summary: This PR fixes a typo in the `torch/autograd/function.py` doc ----------------------- Additionally, the example at https://pytorch.org/docs/master/autograd.html#torch.autograd.Function doesn't quite compile: ``` 'builtin_function_or_method' object has no attribute 'exp' ``` even though `i.exp()` is a valid function if `i` is a tensor. I changed it to: ``` result = torch.exp(i) ``` but python doesn't like it either: ``` TypeError: exp(): argument 'input' (position 1) must be Tensor, not builtin_function_or_method ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66754 Reviewed By: albanD Differential Revision: D31729400 Pulled By: soulitzer fbshipit-source-id: eef783bcdc8d4693a8b7f1ab581e948abc0f9b94	2021-10-18 10:33:56 -07:00
Jane Xu	f4a7273b5c	Set test owners for module: ci (#66796 ) Summary: Action based on RFC https://github.com/pytorch/pytorch/issues/66232 cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/66796 Reviewed By: seemethere Differential Revision: D31732391 Pulled By: janeyx99 fbshipit-source-id: b894eab8a4a8737165d1ba7b536e1232f6c07a8f	2021-10-18 10:29:50 -07:00
Wanchao Liang	8532061bce	[sharded_tensor] support gloo/mpi backend in tests (#65855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65855 This adjusted our test base to support non-nccl backend like gloo/mpi, so that we could test sharding on CPU with gloo/mpi backend. ghstack-source-id: 140840866 Test Plan: wait for the CI for existing tests, also adding tests in the stacked diff above. Reviewed By: pritamdamania87, bowangbj Differential Revision: D31287162 fbshipit-source-id: d48dfc8ef886a4d34b1de42f3ce6b600b5c9a617	2021-10-18 10:17:59 -07:00
Vasiliy Kuznetsov	d549c8de78	fx quant: enable linear-bn1d fusion for PTQ (#66484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66484 https://github.com/pytorch/pytorch/pull/50748 added linear - bn1d fusion in Eager mode, for PTQ only. This PR also enables this in FX graph mode. We reuse the existing conv-bn-relu fusion handler, renaming `conv` to `conv_or_linear` for readability. The QAT version is saved for a future PR, for both eager and FX graph. Test Plan: ``` python test/test_quantization.py TestFuseFx.test_fuse_linear_bn_eval ``` Imported from OSS Reviewed By: bdhirsh Differential Revision: D31575392 fbshipit-source-id: f69d80ef37c98cbc070099170e335e250bcdf913	2021-10-18 10:14:28 -07:00
Shiyan Deng	9d287d0b63	[fx2trt]Add support for negative dim in softmax (#66760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66760 Previously we didn't convert negative dim to positive dim. Test Plan: WIP Reviewed By: wushirong Differential Revision: D31703127 fbshipit-source-id: 6d5ccecab45b46f867a05ee70c76a5980e41011d	2021-10-18 09:03:56 -07:00
Ben Koopman	aa7da7b09c	[quant][embedding qat] Enable quint4 in EmbeddingBag QAT workflow (#66348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66348 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31691300 Pulled By: b-koopman fbshipit-source-id: 11bd75b608b972394fe9f7c9b7bf034af42f28b5	2021-10-18 08:51:39 -07:00
Kushashwa Ravi Shrimali	909694fd88	Fix `nn.functional.max_poolNd` dispatch (for arg: `return_indices`) (#62544 ) Summary: Please see https://github.com/pytorch/pytorch/issues/62545 for context. The order of `return_indices, ceil_mode` is different for `nn.functional.max_poolNd` functions to what seen with `torch.nn.MaxPoolNd` (modular form). While this should be resolved in the future, it was decided to first raise a warning that the behavior will be changed in the future. (please see https://github.com/pytorch/pytorch/pull/62544#issuecomment-893770955 for more context) This PR thus raises appropriate warnings and updates the documentation to show the full signature (along with a note) for `torch.nn.functional.max_poolNd` functions. Quick links: (_upstream_) * Documentation of [`nn.functional.max_pool1d`](https://pytorch.org/docs/1.9.0/generated/torch.nn.functional.max_pool1d.html), [`nn.functional.max_pool2d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html), and [`nn.functional.max_pool3d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool3d.html). (_this branch_) * Documentation of [`nn.functional.max_pool1d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool1d.html?highlight=max_pool1d), [`nn.functional.max_pool2d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool2d.html?highlight=max_pool2d#torch.nn.functional.max_pool2d), and [`nn.functional.max_pool3d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool3d.html?highlight=max_pool3d#torch.nn.functional.max_pool3d). cc mruberry jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/62544 Reviewed By: gchanan Differential Revision: D31179038 Pulled By: jbschlosser fbshipit-source-id: 0a2c7215df9e132ce9ec51448c5b3c90bbc69030	2021-10-18 08:34:38 -07:00
Alexander Grund	e4a9ee8d42	Deduplicate codegenOutputQuery to query maximum CUDA compute capabilities (#55901 ) Summary: There were 2 versions of the same code which were slightly different although functionally equivalent. When adding support for another CUDA / device version both would need to be changed and kept in sync. So it is better to have only 1 version of it as the unique source of truth. I chose the implementation which looks cleaner and easier to read and added some minor enhancements and comments to further increase readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55901 Reviewed By: H-Huang Differential Revision: D31636917 Pulled By: bertmaher fbshipit-source-id: 622e1fabc39de4f3f1b1aa9a1544cfbd35a5cfd9	2021-10-18 07:42:15 -07:00
Kevin Tse	811f5a2b94	Adding StreamWrapper to ensure file object will be closed (#66715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66715 Adding StreamWrapper to streams produced by DataPipes within PyTorch Core and TorchData Test Plan: OSS CI and Internal Tests Reviewed By: ejguan Differential Revision: D31695248 fbshipit-source-id: c26fa1bc1688d5597851ad265f667fafdcd64c59	2021-10-18 07:31:32 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Mengwei Liu	53aac4b6f3	[PyTorch] Allow override for macro `HAS_DEMANGLE` (#66540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540 Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files. Test Plan: Rely on CI Reviewed By: poweic Differential Revision: D31600007 fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036	2021-10-17 16:10:45 -07:00
Natalia Gimelshein	3b4cb9ddca	Revert D31577488: Migrate THCState to ATen Test Plan: revert-hammer Differential Revision: D31577488 (`65adf1dfa2`) Original commit changeset: 90604f30854f fbshipit-source-id: 3d7e35b3d6ea94f2c999bcf821b33a9cf1db01ee	2021-10-16 21:51:36 -07:00
Natalia Gimelshein	719d43a2a2	Revert D31547709: Remove THCGeneral.cpp Test Plan: revert-hammer Differential Revision: D31547709 (`aa0c31876b`) Original commit changeset: 059c47621863 fbshipit-source-id: e8c3597f2badbc5ecf356b381edea06a07331f24	2021-10-16 21:50:19 -07:00
Yukio Siraichi	8854817f44	Implement Python Array API `asarray` function. (#60627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60627 In this PR, the core of `frombuffer` and `fromDLPack` onto _tensor_new.cpp_. `asarray` uses such refactored functions for interpreting the object as a tensor. We follow the Python Array API standard found: https://data-apis.org/array-api/latest/API_specification/creation_functions.html?highlight=asarray Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31640510 Pulled By: mruberry fbshipit-source-id: d0869e0d73cb50023d5866b001dac5d34ca30dfd	2021-10-16 21:11:31 -07:00
Priya Ramani	9e3a2babfa	Make aotCompile support multiple input sizes (#66727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66727 Make aotCompile support multiple input sizes Test Plan: Able to compile and run a model with multiple inputs ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ PYTORCH_JIT_LOG_LEVEL=aot_compiler buck run //caffe2/binaries:aot_model_compiler -- --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims="2,2,2;2,2,2" Building: finished in 3.2 sec (100%) 7461/7461 jobs, 0/7461 updated Total time: 3.4 sec BUILD SUCCEEDED [DUMP aot_compiler.cpp:097] graph before shape propagation [DUMP aot_compiler.cpp:097] graph(%x.1 : Tensor, [DUMP aot_compiler.cpp:097] %y.1 : Tensor): [DUMP aot_compiler.cpp:097] %3 : int = prim::Constant[value=1]() # :0:0 [DUMP aot_compiler.cpp:097] %4 : Tensor = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15 [DUMP aot_compiler.cpp:097] return (%4) (1,.,.) = 0.3357 0.6137 0.8472 0.0858 (2,.,.) = 0.8406 0.2959 0.6012 0.7184 [ CPUFloatType{2,2,2} ] (1,.,.) = 0.7086 0.6398 0.0579 0.1913 (2,.,.) = 0.8598 0.3641 0.5925 0.0200 [ CPUFloatType{2,2,2} ] here 2 2 graph 0x6130001ee2d0 [DUMP aot_compiler.cpp:118] graph after shape propagation [DUMP aot_compiler.cpp:118] graph(%x.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu), [DUMP aot_compiler.cpp:118] %y.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu)): [DUMP aot_compiler.cpp:118] %3 : int = prim::Constant[value=1]() # :0:0 [DUMP aot_compiler.cpp:118] %4 : Tensor(2, 2, 2) = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15 [DUMP aot_compiler.cpp:118] return (%4) The compiled llvm assembly code was saved to aot_test_model.compiled.ll The compiled model was saved to aot_test_model.compiled.pt └─ $ ./compile_model.sh -m aot_test_model -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt -v v1 -i "2,2,2;2,2,2" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=aot_test_model + getopts m:p:v:i:h opt + case $opt in + MODEL_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt + getopts m:p:v:i:h opt + case $opt in + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + INPUT_DIMS='2,2,2;2,2,2' + getopts m:p:v:i:h opt + require_arg m aot_test_model + '[' -n aot_test_model ']' + require_arg p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt + '[' -n /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']' + require_arg i '2,2,2;2,2,2' + '[' -n '2,2,2;2,2,2' ']' + '[' '!' -f /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']' +++ dirname ./compile_model.sh ++ cd . ++ pwd -P + SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc + FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../.. + FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/aot_test_model/v1/nnc ++ echo /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ++ sed 's/.pt.*//' + MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model + LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.ll + ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.s + COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt + KERNEL_FUNC_NAME=nnc_aot_test_model_v1_forward + cd /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc -- --model /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt --print_output true --input_dims '2,2,2$ 2,2,2' --input_type 'float;float' --input_memory_format 'contiguous_format;contiguous_format' clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 1/4 artifacts, 2.11 Kbytes, 50.0% cache miss (for updated rules) Building: finished in 12.2 sec (100%) 4572/4572 jobs, 3/4572 updated Total time: 12.2 sec BUILD SUCCEEDED Run with 56 threads Run with 56 threads Loading model... Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt Running forward ... (1,.,.) = -0.7451 -0.7451 -0.7451 -0.7451 (2,.,.) = -0.7451 -0.7451 -0.7451 -0.7451 [ CPUFloatType{2,2,2} ] Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 0.0887. Iters per second: 11274 Memory usage before main runs: 71262208 bytes Memory usage after main runs: 71573504 bytes Average memory increase per iter: 31129.6 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D31631975 fbshipit-source-id: 7956787b3e121f9c14f4733398a64c2f7ae84373	2021-10-16 20:04:52 -07:00
Priya Ramani	962c6476da	Refactor: move method to func compilation work to compileMethod, add option to specify method name (#66726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66726 Move method to func compilation work to compileMethod Test Plan: Mobilenetv3 compiles and runs successfully ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224" Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 13.2 sec (100%) 18719/18719 jobs, 2/18719 updated Total time: 13.5 sec BUILD SUCCEEDED The compiled llvm assembly code was saved to mobilenetv3.compiled.ll The compiled model was saved to mobilenetv3.compiled.pt ``` Reviewed By: ljk53, IvanKobzarev Differential Revision: D31624342 fbshipit-source-id: 233a6e94ea05ba8d6fc166d2414034c9e58cb076	2021-10-16 20:03:24 -07:00
Peter Bell	aa0c31876b	Remove THCGeneral.cpp (#66391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66391 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31547709 Pulled By: ngimel fbshipit-source-id: 059c47621863738fb560f4257e7765afa9b952aa	2021-10-16 14:53:52 -07:00
Shunting Zhang	8c5928bd78	add frozen_numpy as a builtin library to torch::deploy (#66297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66297 Link register_numpy.cpp with the embedded interpreter will register numpy as a builtin library. Test Plan: Add unit test to test basic numpy functionality in torch::deploy like creating random matrices, matric multiplication. Reviewed By: suo Differential Revision: D31490434 fbshipit-source-id: b052ce01fc64fb0efee846feb0acc1f107ba13e0	2021-10-15 21:48:24 -07:00
Shiyan Deng	42f138469a	[TS] Return early if device doesn't match (#66694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66694 `lhs.equal(rhs)` would throw if the device doesn't match. To avoid that we return early if the device doesn't match. Test Plan: CI Reviewed By: houseroad Differential Revision: D31691608 fbshipit-source-id: 513c3e0743a65d9778c7ef9b79ececfeaccc0017	2021-10-15 18:13:46 -07:00
Richard Barnes	32ac001e4d	Suppress deprecated copy in vec256_qint.h (#66646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66646 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31660387 fbshipit-source-id: a1ea9702a8b33f78a7201a1d9214065c2fb930b1	2021-10-15 17:14:15 -07:00
Peter Bell	65adf1dfa2	Migrate THCState to ATen (#66480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66480 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31577488 Pulled By: ngimel fbshipit-source-id: 90604f30854fe766675baa3863707ac09995bc9e	2021-10-15 17:05:04 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Peter Bell	1e2b2ee5ff	sort_out_cuda: Use custom kernels to fill index tensors (#66668 ) Summary: These stable sorts currently use a combination of `at::arange`, view ops and `tensor.copy_` to fill in the initial values for the indices before calling into `CUB` to do the actual sort. This is somewhat inefficient because it requires 2 to 4 kernel launches, and the copies all use strided kernels instead of the more efficient contiguous kernels. Instead, a fairly straight-forward custom kernel is more efficient in terms of both CUDA and CPU runtime. In a simple benchmark I profiled `a.sort(stable=True, dim=1)` for different shapes and single out the kernel invocations for intitializing the index tensors (i.e. the non-`cub` kernels). Note that when the batch dim is `<128` we call `segmented_sort_pairs_by_full_sort` instead of `segmented_sort_pairs`: \| shape \| Master (us) \| This PR (us) \| \|--------------\|:-----------:\|:------------:\| \| (100, 1000) \| 5.000 \| 2.300 \| \| (1000, 100) \| 2.070 \| 1.090 \| \| (100, 10000) \| 87.34 \| 26.47 \| \| (1000, 1000) \| 28.63 \| 20.27 \| Of course for sufficiently large inputs, the overall runtime is dominated by the actual sort. But I have another motive of wanting to remove operator the calls from the middle of this kernel launch code. This change makes it easier to split the kernel code that needs to be compiled with `nvcc` into it's own file that doesn't include `Tensor.h`, similar to what I'm doing in https://github.com/pytorch/pytorch/issues/66620. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66668 Reviewed By: H-Huang Differential Revision: D31693722 Pulled By: ngimel fbshipit-source-id: 5765926e4dbbc7a20d2940c098ed093b3de2204e	2021-10-15 15:13:02 -07:00
driazati	9ba39d2008	Clean up test running scripts (#65508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65508 This has some misc cleanups for the code that happens before `run_test.py`: * remove hardcoding of 2 shards * add `set -eux` in some places Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31296509 Pulled By: driazati fbshipit-source-id: 2df1463432846d8a4d8a579812a4e9c3b7c2b957	2021-10-15 14:36:32 -07:00
Sangbaek Park	2c761caaaa	[Vulkan] cat operator for channel dimension (#66669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66669 Implemented `cat` operator for channel dimension Facts: * texture coordinate: x(width), y(height), z(depth) * input x, y, z -> no change * out x, y -> no change * out z and index i, j only matter Equations: batch_size = bt0 (or bt1 or bt2 or ...) = # of batch for tensor i ch_size = ch0 (or ch1 or ch2 or ...) = # of channels for tensor i ch_interval = ch0 + ch1 + ch2 + ... = total # of channels for all tensors ch_size_allprior = ch0 (or ch0+ch1 or ch0+ch1+ch2 or ...) = # of channels for tensor 0 to i-1 where pos.z = d (input) i = index of input texel = vec4[i] of texel at posIn(x,y,z) on input texture j = index of output texel = vec4[j] of texel at posOut(x',y',z') on input texture posIn[i] = {x,y,z} at ith index of vec4 src_index = posIn.z * 4 + i dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior d = posOut.z = int(dst_index / 4) j = (dst_index % 4) posOut[j] = {posIn.x, posIn.y, d} at jth index of vec4 Shader pseudo code: posOut = posIn; for (i = 0; i < 4; ++i) { src_index = posIn.z * 4 + i; if (src_index >= ch_size * batch_size) break; // out of range dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior; posOut.z = int(dst_index / 4); j = (dst_index % 4); uOutput[j] = uInput[i] } Test Plan: Test build on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Test result: ``` [ RUN ] VulkanAPITest.cat_dim1_samefeature_success [ OK ] VulkanAPITest.cat_dim1_samefeature_success (101 ms) [ RUN ] VulkanAPITest.cat_dim1_difffeature_success [ OK ] VulkanAPITest.cat_dim1_difffeature_success (81 ms) [ RUN ] VulkanAPITest.cat_dim1_texture2d_success [ OK ] VulkanAPITest.cat_dim1_texture2d_success (2 ms) [ RUN ] VulkanAPITest.cat_dim1_singledepth_success [ OK ] VulkanAPITest.cat_dim1_singledepth_success (6 ms) [ RUN ] VulkanAPITest.cat_dim1_singletensor_success [ OK ] VulkanAPITest.cat_dim1_singletensor_success (21 ms) [ RUN ] VulkanAPITest.cat_dim1_twotensors_success [ OK ] VulkanAPITest.cat_dim1_twotensors_success (53 ms) [ RUN ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success [ OK ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success (17 ms) [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (83 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (86 ms) [ RUN ] VulkanAPITest.cat_dim2_singledepth_success [ OK ] VulkanAPITest.cat_dim2_singledepth_success (5 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (82 ms) ``` Reviewed By: SS-JIA Differential Revision: D31593623 fbshipit-source-id: e52dc57985e3f0bb9b20313d4fcc7248a436e863	2021-10-15 14:25:19 -07:00
Xue Haotian	06cfdfae0e	Promote integral inputs to floating for `torch.logsumexp` (#63393 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/56132, Integral inputs of `torch.logsumexp` would be promoted to the floating point type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63393 Reviewed By: ezyang Differential Revision: D30512180 Pulled By: mruberry fbshipit-source-id: fbde3605c15b930411d0d1eb3a132b0088187097	2021-10-15 14:20:50 -07:00
Don Jang	67e003f09b	[Static Runtime] Determine function for `ProcessedNode::run()` statically (#66692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66692 Currently `ProcessedNode::run()` performs 2 dynamic dispatches to decide which function implementation to execute depending on if the function is an out variant / native / or interpreter fallback. Note that this is happening every time an operation is executed by Static Runtime dynamically. This change makes that same decision during module loading time once so that we can remove 1 dynamic dispatch cost at runtime. size reduction Saving 4 bytes per `ProcessedNode`. - Before: sizeof(c10::variant<OutVariant, NativeFunction, Operation>):40 - After: sizeof(std::function<void(ProcessedNode)>): 32 + sizeof(FunctionKind):4 = 36 latency optimization* Expected to remove 2 memory loads & 1 conditional jump per `ProcessedNode::run()` execution (needs to be confirmed from compiled binary code). Ran `ptvsc2_predictor_bench` with `inline_cvr` with 1000 iterations: - local : 7.56026 -> 7.24794 - local_ro: 1.5799. -> 1.55504. - remote_ro: 10.6464 -> 10.3017 Test Plan: Ran existing unittests Reviewed By: swolchok Differential Revision: D31591785 fbshipit-source-id: 5de83ca386af509381e08ecedf071ee4e9f0f0b0	2021-10-15 14:07:24 -07:00
Nikita Shulga	d1b6121935	Revert D31656999: Add meta support to tensor range factories Test Plan: revert-hammer Differential Revision: D31656999 (`7400f34b8e`) Original commit changeset: 06e7f3655b94 fbshipit-source-id: 2f9d8d1acbb01c5105ece73472e5c1f5f90886ee	2021-10-15 14:03:04 -07:00
Kurt Mohler	a25648953c	Add `warn_only` kwarg to `use_deterministic_algorithms` (#66233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64883 Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error. `torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand. cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233 Reviewed By: bdhirsh Differential Revision: D31616481 Pulled By: mruberry fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420	2021-10-15 13:54:59 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Xiang Gao	b5b7d6a3a6	EmbeddingBackward exclusive_scan thrust->cub (#66566 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66566 Reviewed By: H-Huang Differential Revision: D31637660 Pulled By: ngimel fbshipit-source-id: 8093432bb9a9b902bb6bab7da221f0bcd7e9fb34	2021-10-15 13:46:30 -07:00
Richard Barnes	bd25f92e81	Fix Wextra issues in Half.h (#66643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66643 Fixes: ``` caffe2/c10/util/Half.h:456:14: error: comparison of integers of different signs: 'long' and 'unsigned long' [-Werror,-Wsign-compare] return f > limit::max() \|\| ~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31656816 fbshipit-source-id: 7623d20e166a9e95a949ebd8b23793f24960cf07	2021-10-15 13:38:10 -07:00
Richard Barnes	abc022f9c8	Fix torch.cholesky deprecation warning (#66645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66645 Fixes: ``` test_cholesky_solve_batched_broadcasting_cpu_complex128 (__main__.TestLinalgCPU) ... test_linalg.py:3099: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. ``` Test Plan: Sandcastle Reviewed By: mruberry Differential Revision: D31635851 fbshipit-source-id: c377eb88d753fb573b3947f0c6ff5df055cb13d8	2021-10-15 13:24:58 -07:00
jiayisun	0b8dc0f04a	add BFloat16 operators on CPU: logaddexp, logaddexp2, remainder (#63621 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63621 Reviewed By: H-Huang Differential Revision: D31640811 Pulled By: mruberry fbshipit-source-id: 1fd061b65c196398738018eefc52bf459e424b1c	2021-10-15 13:11:45 -07:00
Shirong Wu	a58852fd44	Fix fx2trt broken unit test (#66696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66696 D31511082 (`9918fd8305`) moved unit test but didn't add proper target in build file, fix it in this diff. Test Plan: buck test mode/opt caffe2/test/fx2trt/converters/... Reviewed By: 842974287 Differential Revision: D31667697 fbshipit-source-id: 49e04afa323b27a1408c9bc2b5061b6529ced985	2021-10-15 12:56:12 -07:00
gmagogsfm	e48a4cbf64	Make several methods of SharedParserData private (#66670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66670 Reviewed By: zhxchen17 Differential Revision: D31674377 Pulled By: gmagogsfm fbshipit-source-id: 5c73b78f842c5c4305047ca98f40bf99bd3d2d60	2021-10-15 12:43:45 -07:00
Scott Wolchok	e88d1c4f10	[PyTorch] Add tuple inline storage (#64066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066 I noticed a bunch of time being spent heap-allocating Tuples in the unpickler. 1-, 2-, and 3-element Tuples are apparently common enough that they get their own bytecode instructions, so I decided to try also giving them their own representation. We store up to 3 IValues inline in `Tuple` rather than doing a second heap allocation for a `std::vector<IValue>`. ghstack-source-id: 140695395 Test Plan: Added automated tests for TupleElements. Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422 We went from 347 ms to 302 ms. Reviewed By: dhruvbird Differential Revision: D30592622 fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8	2021-10-15 12:16:51 -07:00
Teng Zhang	f8f9a47b02	PR3: add a workaround for reference path (#66535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66535 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31676400 Pulled By: rahxephon89 fbshipit-source-id: fd4c8e9bbc82930cc1255fb8bf8d8ac7f0934c3f	2021-10-15 11:56:11 -07:00
Can Balioglu	7400f34b8e	Add meta support to tensor range factories (#66630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66630 This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators. ghstack-source-id: 140618055 Test Plan: Extended the existing tensor creation tests to assert meta backend support. Reviewed By: ezyang Differential Revision: D31656999 fbshipit-source-id: 06e7f3655b94c0d85a28bcd0ca61d9f9ce707f1d	2021-10-15 11:17:08 -07:00
Samantha Andow	6436bd3d5d	Clarify topk doc (#65938 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50331 <img width="855" alt="Screen Shot 2021-10-01 at 11 23 23 AM" src="https://user-images.githubusercontent.com/17888388/136036611-f2bd9c77-61b4-4ab8-85eb-44f50c1e03d7.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/65938 Reviewed By: bdhirsh Differential Revision: D31314875 Pulled By: samdow fbshipit-source-id: bdd9425fd748710f8a64ed1989e1938dd358780f	2021-10-15 11:15:48 -07:00
Gary Miguel	2506baf9c2	[ONNX] move CheckerError from torch.onnx.utils to torch.onnx (#66644 ) Summary: This moves it to where the user would expect it to be based on the documentation and all the other public classes in the torch.onnx module. Also rename it from ONNXCheckerError, since the qualified name torch.onnx.ONNXCheckerError is otherwise redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66644 Reviewed By: malfet Differential Revision: D31662559 Pulled By: msaroufim fbshipit-source-id: bc8a57b99c2980490ede3974279d1124228a7406	2021-10-15 10:38:56 -07:00
Mikhail Zolotukhin	3a9259f6cf	[TensorExpr] Add missing schema for aten::where and aten::pow lowerings. (#66688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66688 Differential Revision: D31689431 D31689431 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 6b3abb4471170ff5418f72bb700325711e7bd28f	2021-10-15 10:14:43 -07:00
Nikita Vedeneev	06c37876b8	`torch.linalg.householder_product` faster backward (#63880 ) Summary: This PR implements a much more efficient algorithm. This algorithm allows to achieve MASSIVE speed-ups, especially for batched and/or larger double-precision inputs. Here are some benchmarks: <details> <summary>Testing script</summary> ```python from IPython import get_ipython import torch import itertools torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def generate_input(shape, dtype=torch.double, device=cpu): eigvals = torch.rand(shape[:-1], dtype=dtype, device=device) eigvecs = torch.rand(shape, dtype=dtype, device=device) input = (eigvecs * eigvals.unsqueeze(-2)) @ eigvecs.inverse() input.requires_grad_(True) tau = torch.rand(*shape[:-1], dtype=dtype, device=device) tau.requires_grad_(True) return input, tau def run_test(shape, device, dtype): print(f"shape: {shape}, device: {device}, dtype: {dtype}") a, tau = generate_input(shape, dtype=dtype, device=device) prod = torch.linalg.householder_product(a, tau) ones_prod = torch.ones_like(prod) command = "torch.autograd.backward((prod,), (ones_prod), retain_graph=True)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() dtypes = [torch.float, torch.double] devices = [cpu, cuda] #devices = [cuda] sizes = [ (10, 10), (1000, 10, 10), (100, 100), (1000, 100, 100), (1000, 1000), (10, 1000, 1000), ] for device, dtype, size in itertools.product(devices, dtypes, sizes): run_test(size, device, dtype) ``` </details> <details> <summary>This PR, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.33 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 1.52 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (100, 100), device: cuda, dtype: torch.float32 10.8 ms ± 9.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 127 ms ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 1000), device: cuda, dtype: torch.float32 151 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 981 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.64 ms ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 298 ms ± 463 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float32 15.4 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 5.36 s ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float32 1.64 s ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 15.7 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.14 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 2.22 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cuda, dtype: torch.float64 10.6 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 287 ms ± 84.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 236 ms ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 1.88 s ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.58 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 308 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float64 79 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 54.2 s ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 31.5 s ± 698 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 4min 45s ± 2.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 476 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 5.1 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 4.38 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 1.55 s ± 6.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 745 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 5.44 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 387 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 12.3 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 39.4 ms ± 80.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 29.1 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 9.42 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 1min 50s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 381 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 6.19 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 4.6 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 2.59 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 1.07 s ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 14.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 395 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 14.6 ms ± 9.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 45.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 33.1 s ± 69.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 19.3 s ± 80.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 3min 30s ± 1.29 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63880 Reviewed By: soulitzer Differential Revision: D30639435 Pulled By: anjali411 fbshipit-source-id: 127789943ae56e2f1dd03e0fe76ef7b6db86bcf0	2021-10-15 09:54:30 -07:00
Jeff Daily	65e25256c3	[ROCm] enable test_distributed() in test.sh (#66657 ) Summary: Restores tests for ROCm CI that used to run prior to https://github.com/pytorch/pytorch/issues/63147. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/66657 Reviewed By: soulitzer Differential Revision: D31668379 Pulled By: malfet fbshipit-source-id: 91a6f6c63d6c957cc5821edbd33d4c16eecc8c0a	2021-10-15 09:45:11 -07:00
Yanli Zhao	8a01bbd64a	add flatten parameter module (#66578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66578 flatten parameters for performance optimization and handle the case when grad ready order is different or there are unused parameters among ranks. when there is no param to be sharded in the FSDP instance (usually root), the flatten wrapper module's flat_param is None. ghstack-source-id: 140696745 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D31625194 fbshipit-source-id: c40e84f9154f5703e5bacb02c37c59d6c4e055c7	2021-10-15 09:37:26 -07:00
CodemodService FBSourceClangFormatLinterBot	a3d12bcdf9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31681115 fbshipit-source-id: e2146e59a57ff27759de18b00fb644e9dc3c5672	2021-10-15 03:07:57 -07:00
Chen Lai	76efbccc3b	[PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152 ) Summary: As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is ``` TracerResult trace_run(const std::string& input_module_path); ``` which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is 1. the dependency on `<yaml-cpp/yaml.h>`. 2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152 ghstack-source-id: 140692467 Test Plan: ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl ``` have the same operator output selected_operators.yaml (P460296279) selected_mobile_ops.h (P460296258) Reviewed By: dhruvbird Differential Revision: D30632224 fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c	2021-10-15 02:19:45 -07:00
Rohan Varma	1e47181c47	[DDP Logging] Add iteration in error reporting (#65772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65772 Looking at some workloads and it would be useful to have this info. ghstack-source-id: 140555200 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31224417 fbshipit-source-id: 14eeb053aced87c7ca43b6879f81f54bd0a42b76	2021-10-14 22:29:36 -07:00
Rohan Varma	3740a06712	[MonitoredBarrier] Fix some logging (#65771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65771 Fixes some logging around monitored_barrier to make it cleaner. ghstack-source-id: 140555204 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31222881 fbshipit-source-id: 77d6f072ce98a9b31192e0d48ea0f8cbd8f216fe	2021-10-14 22:28:16 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
Animesh Jain	59b28063b4	[NNC] Adding more python bindings for missing operators (#66612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66612 For op authoring project, we want to expose the python bindings to create Expr. These are the missing bindings. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31667852 fbshipit-source-id: 6d3ff83a7676cfea391ab3ea60dde6874a64047a	2021-10-14 22:09:01 -07:00
Mike Iovine	8dcf84069e	[PyTorch] Implement improved version of gather_ranges_to_dense (#66677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66677 Reviewed By: wfanzju Differential Revision: D31676536 fbshipit-source-id: a2eb1b1f9e5a0b78f89c3aad19f97acb7c05e1f8	2021-10-14 21:22:15 -07:00
Abhishek Gadewar	70fc60b9d1	Revert D31325860: [PyTorch] Implement improved version of gather_ranges_to_dense Test Plan: revert-hammer Differential Revision: D31325860 (`23710e2d80`) Original commit changeset: 8e154f929ff7 fbshipit-source-id: 6d36d50d6bd4ec4fe07a6e2d1d0110504b9c8b53	2021-10-14 19:43:38 -07:00
Peizhao Zhang	b60050e96a	[qat]Make sure the bn statistics are the same in the unit test. (#66244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66244 Make sure the bn statistics are the same in the unit test. * The fused model in the existing code will have different bn statistics compared to the model without fusion. They will produce the same result when the model is in training mode, but different result in eval mode. Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion Reviewed By: jerryzh168 Differential Revision: D29504500 fbshipit-source-id: 41e3bfd7c652c27619baa7cbbe98d8d06a485781	2021-10-14 19:23:05 -07:00
Mike Iovine	23710e2d80	[PyTorch] Implement improved version of gather_ranges_to_dense (#66664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66664 Reviewed By: hlu1 Differential Revision: D31325860 fbshipit-source-id: 8e154f929ff7c597ff6e41f18278b24c552d1719	2021-10-14 18:37:35 -07:00
Adam Mainz	583217fe37	changes for pytorch issue 55577 (#66571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66571 changes for pytorch issue 55577 Test Plan: Ran test: python test/test_jit.py TestDict Reviewed By: tugsbayasgalan Differential Revision: D31622633 fbshipit-source-id: 171c68a65b1d0bf769b3d95f103daba375e95335	2021-10-14 18:19:11 -07:00
Ansley Ussery	a1084401b0	Clean up `DictLiteral` and `DictComprehension` emission logic (#64953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64953 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30914687 Pulled By: ansley fbshipit-source-id: ab9b9192a29f05b90c113c678e7c795bc087dc99	2021-10-14 17:35:39 -07:00
Ansley Ussery	a7b79033ea	Clean up `ListLiteral` and `ListComprehension` emission logic (#64952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64952 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30914690 Pulled By: ansley fbshipit-source-id: 83ac9bc6445f89b3f47c5404435bc6058c6f3bd7	2021-10-14 17:34:17 -07:00
Samuel Salas	22ec625028	fx2trt example: run all submodules (#66590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66590 Updated fx2trt example to run all submodules Added assertion to make sure outputs from lowered and regular models matches Test Plan: buck run mode/dev-nosan caffe2:fx2trt_example Reviewed By: 842974287 Differential Revision: D31592985 fbshipit-source-id: 45ce0b33e957f16b3729d3ecde706331c29d7214	2021-10-14 17:09:29 -07:00
Dhruv Matani	20aa417e38	[PyTorch] [Quantization] Speed up PackedEmbeddingBagWeight::prepack() (#66632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66632 Calling `.item<float>()` for each element in a tensor is expensive. Instead convert the entire Tensor in one call to `Tensor::copy_(input_tensor)`. See [this post](https://fb.workplace.com/groups/1144215345733672/posts/2080756188746245/) for more details. ghstack-source-id: 140639868 Test Plan: Build and run with bundled inputs. ### AI Bench Before: [AI Bench](https://www.internalfb.com/intern/aibench/details/877359346171823), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634185889953.html): 500ms After: [AI Bench](https://www.internalfb.com/intern/aibench/details/60828780633319), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634231176980.html): 444ms We went from 500ms to 444ms, which is a reduction of ~11%. Reviewed By: supriyar Differential Revision: D31657430 fbshipit-source-id: 199ec9de3dab84bb5727d81c7804bb83bebf7b48	2021-10-14 16:30:39 -07:00
Mikhail Zolotukhin	871a31b9c4	[TensorExpr] Add missing schemas for lshift/rshift lowerings. (#66653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66653 Test Plan: Imported from OSS Reviewed By: navahgar, anijain2305 Differential Revision: D31664748 Pulled By: ZolotukhinM fbshipit-source-id: 13a3154292f12b7bee43b9a5254fb43be032e7c1	2021-10-14 14:19:29 -07:00
Bangsheng Tang	f8348ce9c8	graceful failure for draw_graph() in acc_utils.py (#66631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66631 writing to the current directory is causing issues in CI. we might also consider writing the ".dot" files to some temporary location. Test Plan: CI Reviewed By: 842974287 Differential Revision: D31657078 fbshipit-source-id: 9876327c7f172cd354f1b8e8076597c6a26e2850	2021-10-14 14:04:48 -07:00
Christopher Yeh	1d90f29f14	[DOC] Improve Transformer documentation (#66574 ) Summary: Includes adding some typing annotations to TransformerEncoderLayer and TransformerDecoderLayer Pull Request resolved: https://github.com/pytorch/pytorch/pull/66574 Reviewed By: soulitzer Differential Revision: D31654024 Pulled By: jbschlosser fbshipit-source-id: 9026bd36541699b7205e893decf5abc4a3f0ab5e	2021-10-14 13:26:12 -07:00
Christopher Yeh	3097755e7a	[DOC] Fix typo in KLDivLoss (#66583 ) Summary: Fix simple typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66583 Reviewed By: soulitzer Differential Revision: D31653998 Pulled By: jbschlosser fbshipit-source-id: e4fc91be297cc9a85099d7883b42436b5e3392d3	2021-10-14 13:21:37 -07:00
Harut Movsisyan	914796a69c	Fix for prim::BroadcastMKLDNNTensors issue (#66628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66628 Ensure BroadcastMKLDNNTensors do not break the stack invariant by pushing more than 2 tensors into the stack. Reviewed By: eellison Differential Revision: D31638565 fbshipit-source-id: 4526c0cf7ba8d87dc8a9c213c66c711e83adfc66	2021-10-14 11:53:42 -07:00
Richard Barnes	833ede33ed	Fix ubsan in concat_split_op.h (#66283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66283 Fixes ``` UndefinedBehaviorSanitizer: nullptr-with-nonzero-offset caffe2/caffe2/operators/concat_split_op.h:185:52 ``` Test Plan: Sandcastle Reviewed By: swolchok Differential Revision: D31486274 fbshipit-source-id: 20128056f19cf814fdc3e6e144cf9208a4080d6a	2021-10-14 11:42:30 -07:00
Vasiliy Kuznetsov	76f3b07caf	quantization docs: remove erroneous rebase artifact (#66577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66577 There was a rebase artifact erroneously landed to quantization docs, this PR removes it. Test Plan: CI Imported from OSS Reviewed By: soulitzer Differential Revision: D31651350 fbshipit-source-id: bc254cbb20724e49e1a0ec6eb6d89b28491f9f78	2021-10-14 11:30:47 -07:00
Pritam Damania	016362e2d7	Run sparse tests only for TensorPipe agent. (#66600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66600 Sparse RPC functionality added in https://github.com/pytorch/pytorch/pull/62794 works only for TensorPipe and is broken for other agent types. Moving these tests to a TensorPipe only class. ghstack-source-id: 140553147 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D31633305 fbshipit-source-id: 37d94cb9ed5565a72a6d512c2a9db75a497d5b95	2021-10-14 11:08:15 -07:00
Gary Miguel	543b7fb942	[JIT] Fix type annotations of pooling modules (#65847 ) Summary: All of the pooling modules except MaxUnpool and LPPool return either a Tensor or [Tensor, Tensor]. The current type annotations are inaccurate, and prevent scripting the module if return_indices is set as True in the module. There's not a great way to make this agree with mypy because the overload is dependent on the value of return_indices, an attribute. I tried changing the annotations from `Tensor` to `Union[Tensor, Tuple[Tensor, Tensor]]`, but that breaks a bunch of uses that have return_indices=False. For example, this breaks: `4e94e84f65/torch/nn/modules/container.py (L139)` Also clean up how test names were being constructed in test_jit, since otherwise we were getting name collisions when there were two tests on the same nn.Module. Fixes https://github.com/pytorch/pytorch/issues/45904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65847 Reviewed By: ZolotukhinM Differential Revision: D31462517 Pulled By: eellison fbshipit-source-id: 6f9e8df1be6c75e5e1e9bae07cf3ad3603ba59bd	2021-10-14 10:59:19 -07:00
Peizhao Zhang	51b67f2bca	[qat]Removed outdated context manager in unit test. (#66274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66274 Removed outdated context manager in unit test. * The linked issue (https://github.com/pytorch/pytorch/issues/23825) seemed have been be fixed in 2020. Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_quantize_eager_qat Reviewed By: vkuzo Differential Revision: D29507087 fbshipit-source-id: e8fa04c9527023a5adaf1a012b2c393ce0c5cd97	2021-10-14 10:23:55 -07:00
kshitij12345	49a1d7bfcb	[opinfo] elemwise parcel : isfinite, isinf, isposinf, isneginf, isnan, isreal (#66400 ) Summary: Adds OpInfo for `isfinite, isinf, isposinf, isneginf, isnan, isreal` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66400 Reviewed By: bdhirsh Differential Revision: D31602998 Pulled By: mruberry fbshipit-source-id: 235cc414f373f014f4822a72deb1a04a58ad4a7c	2021-10-14 10:11:57 -07:00
Richard Zou	d810e738b9	OpInfo for `*_like` functions (#65941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65941 OpInfos for: empty_like, zeros_like, ones_like, full_like, randn_like Test Plan: - run tests Reviewed By: dagitses Differential Revision: D31452625 Pulled By: zou3519 fbshipit-source-id: 5e6c45918694853f9252488d62bb7f4ccfa1f1e4	2021-10-14 09:14:51 -07:00
Richard Zou	5d4452937d	OpInfos for some Tensor dtype conversion methods (#64282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64282 OpInfos for: - Tensor.bfloat16, Tensor.bool, Tensor.bypte, Tensor.char - Tensor.double, Tensor.float, Tensor.half, Tensor.int - Tensor.short, Tensor.long None of these are supported by TorchScript. Also, the OpInfo autograd test runner assumes that the operation is not allowed to change the dtype of the argument, so only Tensor.double has `supports_autograd=True` (in theory Tensor.bfloat16, Tensor.float, Tensor.half should be differentiable). Test Plan: - run tests Reviewed By: dagitses Differential Revision: D31452627 Pulled By: zou3519 fbshipit-source-id: b7f272e558558412c47aefe947af7f060dfb45c5	2021-10-14 09:13:30 -07:00
Brian Hirsh	77f98ea5e0	assert no duplicate yaml keys in codegen (#66238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66238 The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value. This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Test Plan: Imported from OSS Reviewed By: dagitses, albanD, sean-ngo Differential Revision: D31464585 Pulled By: bdhirsh fbshipit-source-id: 5b35157ffa9a933bf4b344c4b9fe2878698370a3	2021-10-14 08:28:20 -07:00
lezcano	fe41df3601	Deprecate x.T on tensors of dimension other than 0 or 2 (#64180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64180 BC-breaking note: This PR deprecates the `Tensor.T` are not matrices. An upgrade guide is added to the documentation for `Tensor.T`. This PR DOES NOT make this attribute to throw an error when called on a tensor of `dim != 2`, but this will be its behavior in a future PyTorch release. cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31610611 Pulled By: anjali411 fbshipit-source-id: af8ff7e862790dda9f06921de005b3f6fd0803c3	2021-10-14 08:17:32 -07:00
Vasiliy Kuznetsov	d802877dfa	speed up quantized interpolate for channels last (#66525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66525 This should solve https://github.com/pytorch/pytorch/issues/60015 There were two `q_zero_point()` accesses inside a for loop which was expensive. Moving them to before the loop sped things up 10x for a microbenchmark. Test Plan: ``` // comment out benchmarks unrelated to original issue, for simplicity cd benchmarks/operator_benchmark python -m pt.qinterpolate_test // before: 2994 us // after: 324 us // full results: https://gist.github.com/vkuzo/cc5ef9526dc0cda170d6d63498c16453 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31592422 fbshipit-source-id: b6078ac1039573bbe545275f7aedfd580910b459	2021-10-14 08:11:26 -07:00
CodemodService FBSourceClangFormatLinterBot	a40812de53	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31646229 fbshipit-source-id: 26a89b8eb88d31259f79c8f9061e016d57a1e462	2021-10-14 04:52:16 -07:00
Hao Lu	6310eb30d1	[SR] Clean up GetLivenessMap (#66606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66606 - Remove dead code (see comment for where) - Add debug prints - Small reorganization of the code to improve readability Reviewed By: d1jang Differential Revision: D31568219 fbshipit-source-id: 50240c325bf4fd012e1947ac931bb67c6f5dfafb	2021-10-13 23:55:40 -07:00
Jerry Zhang	e1348973ac	Add common_fx2trt.py (#66579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66579 Didn't commit this file in the PR that open sources fx2trt tests Test Plan: ci Reviewed By: 842974287 Differential Revision: D31623354 fbshipit-source-id: 6cedbe0f229da40499b83e6df28e16caca392d9c	2021-10-13 21:24:11 -07:00
Alex Beloi	74849d9188	[acc_shape_inference] add shape inference for quantize_per_channel (#66562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66562 Adding shape inference for `acc_ops.quantize_per_channel`, and fixing some bugs. Bugs were related to the fact that `quantize_per_channel` arguments `scales` and `zero_points` take tensors, so when we fetch the values (which needs to be done using `.tolist()` instead of `.item()`) we may get either a list or a scalar value. Test Plan: # Test Quantized Resnet From sandbox with GPU that supports quantized types (tested with V100) `buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test` Output ``` ... [TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 0 MiB, GPU 1548 MiB [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 0 MiB, GPU 1548 MiB [TensorRT] VERBOSE: Using cublasLt a tactic source [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 0, GPU 1556 (MiB) [TensorRT] VERBOSE: Using cuDNN as a tactic source [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 0, GPU 1564 (MiB) [TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [TensorRT] VERBOSE: Total per-runner device memory is 23405056 [TensorRT] VERBOSE: Total per-runner host memory is 73760 [TensorRT] VERBOSE: Allocated activation device memory of size 154140672 [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 0 MiB, GPU 1736 MiB trt fp16 time (ms/iter) 1.252899169921875 trt int8 time (ms/iter) 1.3774776458740234 trt implicit int8 time (ms/iter) 1.3835883140563965 PyTorch time (CUDA) (ms/iter) 4.34483528137207 PyTorch time (CPU) (ms/iter) 55.687150955200195 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1918 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1866 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1738 (MiB) WARNING: Logging before InitGoogleLogging() is written to STDERR W1012 12:07:23.556475 711816 DynoConfigLoader.cpp:32] Failed to read config: No dyno config client ``` # Test shape inference `buck test mode/opt glow/fb/fx/acc_tracer:test_acc_shape_inference` Output ``` ... Summary Pass: 95 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1407375092088240 ``` Reviewed By: jfix71, jerryzh168 Differential Revision: D31457323 fbshipit-source-id: 8ccc4a9b0ca655fb30838e88575aff2bf3a387a6	2021-10-13 21:03:08 -07:00
Natalia Gimelshein	7d9bbd3596	Revert D31580382: [pytorch][PR] dropout update in autodiff Test Plan: revert-hammer Differential Revision: D31580382 (`eb8138d886`) Original commit changeset: 41d15da99bf4 fbshipit-source-id: 59f751ee59602a5fd09c17f8c7565dca5e2beb50	2021-10-13 19:52:05 -07:00
Bert Maher	c1c985a282	Rename tensorexpr::Value so that it can coexist with torch::jit::Value (#66467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66467 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D31619973 Pulled By: bertmaher fbshipit-source-id: eebea821fbbd0ae6f0a7144809c87c7da7f88699	2021-10-13 19:41:07 -07:00
Hao Lu	6634570aef	[SR] Fix bug in ValueGroup (#66470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66470 Reviewed By: d1jang Differential Revision: D31566348 fbshipit-source-id: e0f634af77d893bbc8d66f214b2b8bdd6ab58cc3	2021-10-13 19:26:38 -07:00
Scott Wolchok	d30397d42a	[PyTorch][Static Runtime] Don't use vector in ProcessedNode (#65429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65429 The sizes of these arrays can't change, so there's no need to waste an extra pointer on them. ghstack-source-id: 140532722 Test Plan: CI I profiled this diff and the previous diff together. Comparing time spent in the operator functor handler for to_copy, I see the load instruction fetching the inputs pointer from p_node on https://www.internalfb.com/code/fbsource/[4c98a83b2451fa6750f38796c91ebb0eb0afd800]/fbcode/caffe2/torch/csrc/jit/runtime/static/ops.cpp?lines=947 (`p_node->Input(0).toTensor()`) improved a tiny bit, and the overall time spent in that wrapper decreased from 0.8% to 0.7%. Reviewed By: hlu1 Differential Revision: D31096042 fbshipit-source-id: 35c30462d6a9f9bd555d6b23361f27962e24b395	2021-10-13 19:13:20 -07:00
Samuel Salas	c6f0dde3ca	Cumsum Converter (#66376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66376 Added converter for cumsum and unit test Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_cumsum Reviewed By: wushirong, 842974287 Differential Revision: D31423701 fbshipit-source-id: ee3aa625d6875ba8e6bad27044d22638e99b5c03	2021-10-13 19:04:37 -07:00
Can Balioglu	160946e3f3	Use `torch.empty()` instead of `torch.tensor()` in `torch.nn.Parameter` (#66486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66486 The newly-introduced Python dispatcher mode (`__torch_dispatch__`) does not have support for `torch.tensor()` (see #64360) and this causes friction in the user experience if some `nn.Modules` use `torch.tensor()` either implicitly or explicitly. This PR replaces calls to `torch.tensor()` in `Parameter`, `UninitializedParameter`, and `UninitializedBuffer` with an equivalent call to `torch.empty()` which serves the same purpose and is syntactically more readable. ghstack-source-id: 140520931 Test Plan: Since no behavioral change, run the existing unit and integration tests. Reviewed By: pbelevich Differential Revision: D31575587 fbshipit-source-id: bd7bdeea54370f3e53dc13bd182b97d0f67146f5	2021-10-13 18:56:36 -07:00
Peter Bell	30d9fd9cf3	Migrate USE_MAGMA config macro to ATen (#66390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66390 Test Plan: Imported from OSS Reviewed By: malfet, bdhirsh Differential Revision: D31547712 Pulled By: ngimel fbshipit-source-id: 1b2ebc0d5b5d2199029274eabdd014f343cfbdd3	2021-10-13 17:50:10 -07:00
Natalia Gimelshein	e75de4f307	remove a few unused THCTensor/Storage methods (#66555 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/66555 Reviewed By: mruberry Differential Revision: D31620969 Pulled By: ngimel fbshipit-source-id: 1922ef523df473e8673a35c4a155b7b0cf000953	2021-10-13 17:18:11 -07:00
Peter Bell	4e1c075542	log_sigmoid: Use log1p for improved precision (#66441 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20972 log_sigmoid calculates something like `log(1 + x)` where x is always a positive number less than one. This wastes floating point precision because the exponent always becomes zero. Instead, using `log1p(x)` gives the full mantissa precision around `x=0`. This also fixes infinity propagation because the old code does, `exp(in - in)` when `in` is negative. Which for infinity, results in a NaN instead of 0. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66441 Reviewed By: bdhirsh Differential Revision: D31619630 Pulled By: albanD fbshipit-source-id: e7867f3459a91e944b92f8ca42b6e0697b13f89b	2021-10-13 16:36:13 -07:00
Edward Yang	24202f7fb4	Remove native_functions.yaml dependency from Activation.cu (#64499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64499 This moves the native functions into a separate Activation.cpp file, which calls into `launch_..._kernel` functions defined in `Activation.cu`. The exception is `rrelu_with_noise` which is compilcated by the random number generation code, so I've moved it into its own file. Test Plan: Imported from OSS Reviewed By: jbschlosser, ezyang Differential Revision: D30867323 Pulled By: dagitses fbshipit-source-id: a4cd6f1fb1b1fed4cc356bf8b3778991ae2278ba	2021-10-13 16:28:13 -07:00
jiej	eb8138d886	dropout update in autodiff (#66273 ) Summary: 1. Unifies dropout op in autodiff 2. Removes dropout inference support in autodiff Pull Request resolved: https://github.com/pytorch/pytorch/pull/66273 Reviewed By: jbschlosser, gmagogsfm Differential Revision: D31580382 Pulled By: eellison fbshipit-source-id: 41d15da99bf4ce6c47cc335a4156c4a1c9705a70	2021-10-13 16:23:40 -07:00
Peter Bell	5f45927d15	Autograd: Delay warnings until the end of backward execution (#66235 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50209 This adds a new warning handler that stores all warnings in a shared queue, which can be "replayed" at a later time and, crucially, on another thread. Then, I use this inside the autograd engine to ensure that warnings are processed by the handler registered on the main thread. For testing, I also add an operator that always warns in the backward pass and test that the warning is a normal Python warning. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235 Reviewed By: ejguan Differential Revision: D31505413 Pulled By: albanD fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9	2021-10-13 15:38:04 -07:00
Nikita Shulga	42328090cb	[GHA] Hardcode doc build target to `master` (#66567 ) Summary: According to `f48f20e154/.circleci/verbatim-sources/job-specs/job-specs-custom.yml (L46-L48)` target should always be master (even on release branches) unless it is a tagged build Fixes https://github.com/pytorch/pytorch/issues/66466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66567 Reviewed By: seemethere Differential Revision: D31621530 Pulled By: malfet fbshipit-source-id: d6de2222d0340820555a82ae90b3de22b4dc7b88	2021-10-13 15:08:46 -07:00
Scott Wolchok	0aab34c26c	[jit] Refcounting spot fixes in alias_analysis (#66295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66295 Tidying up the top sources of reference count decrements seen during static runtime startup in alias_analysis.cpp specifically. ghstack-source-id: 140484160 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31490761 fbshipit-source-id: bbdcb7f9065c3aafa7fff7bfea9cea6dbc41f9d9	2021-10-13 14:47:32 -07:00
Scott Wolchok	9767282643	[jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344 Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them. ghstack-source-id: 140484162 Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements Reviewed By: ejguan Differential Revision: D31027363 fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb	2021-10-13 14:47:30 -07:00
Scott Wolchok	75d98fa0ae	[jit] Implement one-element MemoryDAG::mayContainAlias more efficiently (#65178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65178 There is no need to copy the MemoryLocations in this case. ghstack-source-id: 140484161 Test Plan: CI static runtime startup for ctr_mobile_feed decreased from 7.0s to 6.3s Reviewed By: suo Differential Revision: D30984442 fbshipit-source-id: 61bb678c4480cd030aaab2bbc8a04cbd9b7c7f4d	2021-10-13 14:46:16 -07:00
Shiyan Deng	9e8281fd2f	[fx2trt][code quality] Add type annotation and docstring to utils functions in acc_ops_converters.py (#66496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66496 As the title. No changes on the code logic. Test Plan: CI Reviewed By: wushirong Differential Revision: D31576303 fbshipit-source-id: f2132309023b3c9e09810e32af91eb42eefd3f32	2021-10-13 14:06:15 -07:00
Mike Iovine	37db650c9c	[Static Runtime] Clone test does not use uninitialized memory (#66557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66557 The test was previously using `at::empty_strided` to initialize one of its inputs. The contents of the tensor returned by this function are random, uninitialized memory. If we happened to get a NaN, this test would fail since `use_equalnan` was not set. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31611961 fbshipit-source-id: 79a9476d0d6ce7a9f1412eefcef19bc2618c54b8	2021-10-13 14:02:34 -07:00
Michael Suo	82986a17a6	fix lint (#66572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd	2021-10-13 13:59:08 -07:00
anjali411	a82fcd3560	Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082 Fixes https://github.com/pytorch/pytorch/issues/66024 #65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19	2021-10-13 13:57:51 -07:00
John Shen	675ba6cd53	[qnnpack] Remove usage of conv_param_t in deconv-run.cc (#66465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66465 conv_param_t is being removed as it stores redundant information. This removes the last usage of it in qnnpack so we can begin removing the dependency. ghstack-source-id: 140475374 Test Plan: github tests Reviewed By: kimishpatel Differential Revision: D31564679 fbshipit-source-id: 049a28fac0235b2e739fb2e048484d7e8e7189fa	2021-10-13 13:51:15 -07:00
Saketh Are	86cf22cb1c	Add OpInfo for torch.bucketize (#65821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65821 Reviewed By: malfet, mruberry Differential Revision: D31386048 Pulled By: saketh-are fbshipit-source-id: fae7ec7b6b57436d87d38d421c5f3f52be4cdadd	2021-10-13 13:46:35 -07:00
anjali411	035310c574	Handle shared memory cases in MathBithFallback (#63602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602 This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example: ``` a=torch.tensor([1+1j]) b=a.conj() b.add_(a) # should return tensor([2]) but returns tensor ([2-2j]) ``` The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s). This PR fixes this issue by: 1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`). 2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before. 3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector. 4. Do the computation. 5. Re-conjugate the mutable argument tensors. NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details. Fixes https://github.com/pytorch/pytorch/issues/59943 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30466905 Pulled By: anjali411 fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b	2021-10-13 13:39:31 -07:00
lezcano	c04bcde245	Make empty* and *_like factory functions respect tensor subclasses (#65677 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65243 cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65677 Reviewed By: dagitses Differential Revision: D31432032 Pulled By: albanD fbshipit-source-id: 77f464974c7656c1206085aba9300471d7e0ef57	2021-10-13 13:34:53 -07:00
Nikita Shulga	b792a77895	Skip `interactive_embedded_interpreter.cpp` for clang-tidy (#66569 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66569 Reviewed By: suo Differential Revision: D31622885 Pulled By: malfet fbshipit-source-id: 61bad5ff3011f992cdd149724c935c098996d6a2	2021-10-13 13:27:56 -07:00
Eli Uriegas	09b90612c4	.github: Enable onnx tests (#66513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513 These were missed in the migration of onnx to github actions. Adds ort tests with 2 shards for the onnx workflow Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31599433 Pulled By: seemethere fbshipit-source-id: 73dce0d3017c4280e64f0c8578e2be7ef6a168d6	2021-10-13 13:14:02 -07:00
Will Constable	f48f20e154	Make ContainerHash compatible with const& types (#66497 ) Summary: - this change should not impact existing use cases, but allows for additional use cases where the container holds const types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66497 Reviewed By: alanwaketan Differential Revision: D31582242 Pulled By: wconstab fbshipit-source-id: 3a0e18b4afaf3c7ff93a0e3d09067ed066402b44	2021-10-13 12:45:17 -07:00
Natalia Gimelshein	fdd9f49cf5	add a note on numerical accuracy (#65947 ) Summary: Per title Fixes https://github.com/pytorch/pytorch/issues/54437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947 Reviewed By: albanD Differential Revision: D31612445 Pulled By: ngimel fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c	2021-10-13 12:43:55 -07:00
Shunting Zhang	a453ebc8ac	Use interactive_embedded_interpreter to dynamicly loading various third-party libraries (#66512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66512 TLDR, we are able to use the interactive_embedded_interpreter (basically just torch::deploy interpreter with an interactive shell) to dynamicly load various third party libraries. We use the popular libraries numpy, scipy, regex, pandas for illustration purpose. A couple of changes need to be done for the interactive_embedded_interpreter: 1, we need link with :embedded_interpreter_all rather than :embedded_interpreter so we can enable DEEPBIND and use our custom loader 2, we provide a pylibRoot path to construct the InterpreterManager. The path will be added to the embedded interpreter's sys.path. Typically we can pass in the python library root path in a conda environment so torch::deploy interpreter can find all installed packages. 3, we allow interactive_embedded_interpreter execute a script to ease recording the exploration of various python libraries. ghstack-source-id: 140453213 Test Plan: Install numpy, scipy, regex, pandas in the conda environment or on the machine directly. Suppose /home/shunting/.local/lib/python3.8/site-packages/ is the root path for the installed libraries. - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_regex.py content of try_regex.py: ``` import regex print(regex) pat = r'(.+)\1' print(regex.match(pat, "abcabc")) print(regex.match(pat, "abcba")) print("bye") ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_numpy.py content of try_numpy.py: ``` import numpy as np print(f"numpy at {np}") a = np.random.rand(2, 3) b = np.random.rand(3, 2) print(np.matmul(a, b)) ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_scipy.py content of try_scipy.py: ``` import numpy as np from scipy import linalg mat_a = np.array([[1, 0, 0, 0], [1, 1, 0, 0], [1, 2, 1, 0], [1, 3, 3, 1]]) mat_b = linalg.inv(mat_a) print(mat_b) ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_pandas.py content of try_pandas.py: ``` import pandas as pd print(f"pandas at {pd}") df = pd.DataFrame({ "col1": [1, 2, 3, 4], "col2": [2, 4, 8, 16], }) print(df) ``` Reviewed By: suo Differential Revision: D31587278 fbshipit-source-id: c0b031c1fa71a77cdfeba1d04514f83127f79012	2021-10-13 12:39:13 -07:00
Stephen Jia	a8815d557a	[vulkan] Remove the persistent resource pool (#66478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66478 A persistent resource pool was needed to store prepacked tensors since the main resource pool tied to the global Vulkan context would be flushed at the end of each inference run. However, prepacked tensors needed to alive between inference runs, so an additional persistent resource pool was introduced that would only be flushed when the Vulkan context was destroyed. However, with [this change](https://github.com/pytorch/pytorch/pull/66477) the resource pool no longer indiscrimately flushes allocated resources at the end of an inference run. Tensors will have to call `release_resources()` before they become eligible to be destroyed. Since prepacked tensors are tied to an `OpContext` object they will stay alive between inference runs. Therefore, the persistent resource pool is no longer needed. Test Plan: Build and run `vulkan_api_test`. Reviewed By: beback4u Differential Revision: D31490076 fbshipit-source-id: 3741a2333c834796d589774e819eaaf52bb9f0fe	2021-10-13 12:01:08 -07:00
Stephen Jia	cebaf21c5a	[vulkan] Release GPU resources when vTensor::View is destroyed (#66477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66477 Currently, Vulkan tensor memory is allocated and deallocated through the following mechanism: 1. During inference, ops will request buffer and/or texture memory for tensors from the [Resource Pool](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.h#L324-L327) 2. The resource pool allocates the memory and [adds it to a vector](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L609-L622) containing all the memory allocations it has made this inference, then returns the most recently allocated block of memory 3. At the end of inference, results are transferred back to the CPU and the [context is flushed](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Copy.cpp#L150) 4. As part of the context flush the [resource pool is purged](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Context.cpp#L143) which [deallocates all buffer and texture memory](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L683-L684) allocated by the resource pool This pattern makes it impossible to have models with multiple outputs. When the first output tensor is transferred back to the CPU, the memory of the other output tensors will be deallocated when the context is flushed. Instead, an alternative is to tie resource destruction to the destructor of the [vTensor::View](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Tensor.h#L243) class, which holds the actual implementation and storage of Vulkan tensors. This will ensure that memory associated with a tensor will be cleaned up whenever it is no longer used. The new deallocation mechanism proposed is: 1. During inference, `vTensor` objects will request GPU memory from the resource pool, same as before. 2. The resource pool allocates buffer or texture memory and returns it directly to the `vTensor` 3. Throughout inference, intermediate tensors' reference counts will go to 0 and the destructor of the `View` class will be called 4. The destructor will any texture and buffer memory it's holding to the resource pool's list of GPU memory allocations to be cleaned 5. At the end of inference `purge()` will be called which will destroy all allocations in the list of allocations to be cleaned 6. GPU memory for output tensors will not be destroyed, since their reference counts will be greater than 0, thus they have not yet been added to the list of allocations to be destroyed Note that it is not correct to have the destructor directly deallocate GPU memory. This is due to the fact that Vulkan ops simply submit work to the GPU but does not guarantee that work has completed when the op returns. Therefore we must keep all allocated GPU memory until the end of inference, when we wait for the GPU to complete work. Test Plan: build and run `vulkan_api_test` to make sure existing functionality is not impacted. Also test in a later diff that checks that output tensors stay alive after inference completes. Reviewed By: dreiss Differential Revision: D31510899 fbshipit-source-id: 99250c2800a68f07b1b91dbf5d3b293184da5bd2	2021-10-13 11:59:40 -07:00
Yinghai Lu	5e34ac6c43	[FX] Fix cases when we should not fuse due to more than one users of intermediate node (#66472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66472 A follow up of https://github.com/pytorch/pytorch/pull/66362. Same fix. Test Plan: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: wushirong, 842974287 Differential Revision: D31567662 fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99	2021-10-13 11:53:42 -07:00
Michael Suo	9d13ae450a	[oss/ci] skip all dataloader tests with asan (#66561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66561 See https://github.com/pytorch/pytorch/issues/66223 for context. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31617142 Pulled By: suo fbshipit-source-id: 16b280fc47a7c40fa19c5c72192d342dd33680bf	2021-10-13 11:39:41 -07:00
Tomi Peltola	713e025c9f	Add no-input-grad-needed cases to test_grid_sample (#66071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66071 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431801 Pulled By: albanD fbshipit-source-id: 57a94ed9e97e402aa8193d69355e57b6309c64f7	2021-10-13 10:56:47 -07:00
Tomi Peltola	8a40bb62f9	Compute input gradient only if required (CUDA) (#66070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66070 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431805 Pulled By: albanD fbshipit-source-id: 8c3de6632aaee168ec6fd7eb79a5af26973af9c5	2021-10-13 10:56:45 -07:00
Tomi Peltola	f8d98b5a6d	Compute input gradient only if required (CPU) (#66069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66069 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431803 Pulled By: albanD fbshipit-source-id: d4caba5fa092e4ee7411502021836370082670b2	2021-10-13 10:56:43 -07:00
Tomi Peltola	84385c40e4	Add output_mask (#66068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66068 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431802 Pulled By: albanD fbshipit-source-id: 322aae5614dacb06fd45e513465b7a5cc11f4dbb	2021-10-13 10:55:27 -07:00
Arpan Abhishek	6401658b08	fix type error in hipify_python.py (#66164 ) Summary: - [x] Fixed the Pyre type checking errors in `torch/utils/hipify/hipify_python.py`: ``` torch/utils/hipify/hipify_python.py:196:8 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`. torch/utils/hipify/hipify_python.py:944:4 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`. ``` Fixing the issue: https://github.com/MLH-Fellowship/pyre-check/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66164 Reviewed By: onionymous Differential Revision: D31411443 Pulled By: 0xedward fbshipit-source-id: c69f8fb839ad1d5ba5e4a223e1322ae7207e1574	2021-10-13 10:33:49 -07:00
jjsjann123	d85948896c	Add softplus support to autodiff (#63942 ) Summary: Add softplus definition to autodiff. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63942 Reviewed By: ngimel Differential Revision: D31397158 Pulled By: eellison fbshipit-source-id: f7db547370f82e5e282505c3c8415fb4fbd86d54	2021-10-13 08:08:09 -07:00
lezcano	82a216c45b	Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179 This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478 Fixes https://github.com/pytorch/pytorch/issues/45063 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30730483 Pulled By: anjali411 fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2	2021-10-13 07:44:43 -07:00
Christopher Gray Howard	87df043f63	[Bootcamp][Pytorch]Add testing for complex parameters in Adagrad optimizer (#66501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66501 Add testing for the Adagrad optimizer to ensure that it behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github ghstack-source-id: 140414042 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1R27M Reviewed By: albanD Differential Revision: D31584240 fbshipit-source-id: 5c9938084566b8ea49cc8ff002789731f62fe87e	2021-10-13 07:05:20 -07:00
Louis Feng	ecb7b38c00	[PyTorch] Support additional arguments in Python record function (#65736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736 We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information. The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter. This diff adds support for user to optionally to add additional arguments to the record function in two ways. 1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`. 2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor. Example usage: ``` # record_function operator with a name and optionally, a string for arguments. with record_function("## TEST 1 ##", "[1, 2, 3]"): <actual module or operator> # more general form of record_function a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u) <actual module or operator> _record_function_with_args_exit(a) ``` Corresponding outputs in execution graph: ``` { "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"], "outputs": [], "output_shapes": [], "output_types": [] }, { "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Test Plan: ``` => buck build caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json Net filter: Target net for iteration count: Net Iterations: 3 INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30 Trace duration: 500ms Warmup duration: 5s Net size threshold: 0 GPU op count threshold: 0 Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json Trace compression enabled: 1 INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets: INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.021s OK ``` Reviewed By: gdankel Differential Revision: D31165259 fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4	2021-10-13 01:49:15 -07:00
Jerry Zhang	9918fd8305	[fx2trt] open source tests for converters (#66361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66361 ossci will be setup later, fbonly ci is ready Test Plan: buck run caffe2/test:fx2trt_test_linear testinprod Reviewed By: 842974287 Differential Revision: D31511082 fbshipit-source-id: 9e2c50c83fdba822cd2488eb17b5787d8a57f087	2021-10-13 00:09:43 -07:00
Peter Bell	80a3619823	Remove THCTensorMathReduce.cuh (#66389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66389 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31547711 Pulled By: ngimel fbshipit-source-id: c181d14f66536b6873b5b14088312c6c70bf0855	2021-10-12 22:59:19 -07:00
Junjie Wang	bc6935ddf5	[PyTorch][Distributed][Easy] Make ShardedTensor.size() equivalent to torch.Tensor.size() (#65087 ) (#66012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66012 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D31345161 Pulled By: fduwjj fbshipit-source-id: 10d6b65780ab0c6934babcc7c36a181cb66f0b7c	2021-10-12 22:26:22 -07:00
Peter Bell	8eb85b5027	Remove THCNumerics (#66388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66388 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31547710 Pulled By: ngimel fbshipit-source-id: 20710328f2e5fc2e931a3f8ba9b4243acc310d54	2021-10-12 22:05:03 -07:00
Eli Uriegas	2d3b23190c	Revert D31591512: .github: Enable onnx tests Test Plan: revert-hammer Differential Revision: D31591512 (`06a156efc7`) Original commit changeset: 4a8bb3f0e62f fbshipit-source-id: 2d8580c0e507c2a0b30431bcf30eb01cef82f602	2021-10-12 20:17:02 -07:00
Ivan Yashchuk	08f3823647	Sparse CSR CUDA: add `addmv_out` (#61407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61407 This PR adds `addmv_out_sparse_csr_cuda`. The operation is used to compute matrix-vector multiplication. Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated. Working on this PR revealed that float16 (and probably bfloat16) inputs do not work correctly in cusparse, therefore for this case `addmm` is used with squeezes and unsqueezes. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31584499 Pulled By: ngimel fbshipit-source-id: 4c507791471ada88969116b88eeaaba7a7536431	2021-10-12 20:06:56 -07:00
Eli Uriegas	8492e6bc6a	.github: scheduled -> schedule, fix periodic (#66531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66531 The github.event_name should be schedule not scheduled Reference, https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#schedule Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31598136 Pulled By: seemethere fbshipit-source-id: 4d67f7731b21e05dabc8f54b4ebf9a5d2d3a4e1e	2021-10-12 19:46:01 -07:00
Eli Uriegas	06a156efc7	.github: Enable onnx tests (#66513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513 These were missed in the migration of onnx to github actions. Adds ort tests with 2 shards for the onnx workflow Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31591512 Pulled By: seemethere fbshipit-source-id: 4a8bb3f0e62ff98ee77d3d8afc905f4e02db6f24	2021-10-12 19:35:09 -07:00
soulitzer	93d326c868	Add InplaceOrView boxed kernel (#63878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878 See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context: In this PR: - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node. - This limitation makes it impossible to use this node for general use - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor) - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature: - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`. - Adds testing: - for views: view relationship created - performing in-place operation on the view, raises properly - trying to create two view relationships is not allowed, - single view relationship but not first input/first output should error - view relation created properly for tensor vector output - for in-place: - version count bump - triggers rebase_history - multiple mutations is okay and also updates version counter - TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations) - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30901714 Pulled By: soulitzer fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c	2021-10-12 18:55:50 -07:00
Teng Zhang	40794dbb25	add backend_config_dict to checkGraphModeFxOp (#66499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66499 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31582518 Pulled By: rahxephon89 fbshipit-source-id: b8107bb7140517f2dc32bf692c6b916536ea35c3	2021-10-12 18:35:54 -07:00
Nikita Shulga	d32736e317	Make permission errors more human readable (#66492 ) Summary: `_mkdir_p` feels like a remnant of Python-2 era, add `exist_ok` argument and re-raise OSError to make it more human readable. After the change attempt to build PyTorch in a folder that does not have write permissions will result in: ``` % python3.6 setup.py develop Building wheel torch-1.10.0a0+git9509e8a -- Building version 1.10.0a0+git9509e8a Traceback (most recent call last): File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 21, in _mkdir_p os.makedirs(d, exist_ok=True) File "/opt/homebrew/Cellar/python36/3.6.2+_254.20170915/Frameworks/Python.framework/Versions/3.6/lib/python3.6/os.py", line 220, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: 'build' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "setup.py", line 895, in <module> build_deps() File "setup.py", line 370, in build_deps cmake=cmake) File "/Users/nshulga/git/pytorch-worktree/tools/build_pytorch_libs.py", line 63, in build_caffe2 rerun_cmake) File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 225, in generate _mkdir_p(self.build_dir) File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 23, in _mkdir_p raise RuntimeError(f"Failed to create folder {os.path.abspath(d)}: {e.strerror}") from e RuntimeError: Failed to create folder /Users/nshulga/git/pytorch-worktree/build: Permission denied ``` Fixes https://github.com/pytorch/pytorch/issues/65920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66492 Reviewed By: seemethere Differential Revision: D31578820 Pulled By: malfet fbshipit-source-id: afe8240983100ac0a26cc540376b9dd71b1b53af	2021-10-12 18:31:24 -07:00
Jane Xu	d921891f57	GHA: Stop skipping periodic jobs (#66264 ) Summary: they have been skipped for too long ![image](https://user-images.githubusercontent.com/31798555/136433267-f35c0507-23ab-4348-be43-78d299c3d654.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66264 Reviewed By: dagitses, malfet, seemethere Differential Revision: D31478705 Pulled By: janeyx99 fbshipit-source-id: 1324b123e3f8646e5cd671af4c1850398a6f6e3b	2021-10-12 14:39:47 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Peter Bell	9984f4bb8b	Remove native_functions.yaml dependency from some reduction operators (#64173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64173 This one also required restructuring the code a bit to move the kernel code into seperate files. So, I've mainly focused on CUDA which is where the real build-time issues are. Test Plan: Imported from OSS Reviewed By: jbschlosser, ezyang Differential Revision: D30728581 Pulled By: dagitses fbshipit-source-id: a69eea5b4100d16165a02660dde200c8f648683d	2021-10-12 13:11:24 -07:00
Natalia Gimelshein	ee38a467ea	fix normal with empty std (#66463 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66463 Reviewed By: navahgar Differential Revision: D31561904 Pulled By: ngimel fbshipit-source-id: 3b2f44dc0ec075fe4f9685696578a0ff6e58d501	2021-10-12 11:28:11 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Kimish Patel	c6216b2a43	Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421 Original commit changeset: ab6bb8fe4e83 Plus this incldes BUILD.bazel changes, the reason for the revert. Test Plan: See original diff Reviewed By: gdankel Differential Revision: D31542513 fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb	2021-10-12 10:55:29 -07:00
Scott Wolchok	d7916e3734	[jit] Eliminate malloc & recursive refcount bumps in HashType::operator() (#65348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65348 Previously, this took several percent of model loading time. Now it is well under 1%. We get this savings by avoiding allocating a vector and avoiding reference count bumps on contained types within each type. ghstack-source-id: 140148562 Reviewed By: suo Differential Revision: D31057278 fbshipit-source-id: 55a02cbfefb8602e41baddc2661d15385fb2da55	2021-10-12 10:51:17 -07:00
Scott Wolchok	47c531b6e8	[jit] Compare object identity first in ClassType::operator== (#65347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65347 This check is much cheaper than anything involving actually inspecting object fields (i.e., the cost is low), and if it succeeds we can skip the expensive (e.g., it involves locking a weak_ptr and then destroying the resulting shared_ptr) function body. It almost entirely eliminates time spent in this function during model loading according to perf. ghstack-source-id: 140148561 Test Plan: Specifically I profiled static runtime startup for the ctr_mobile_feed model and saw self time in this function go from 2-3% to 0.36%. Reviewed By: ejguan Differential Revision: D31057279 fbshipit-source-id: efb6bdc0957b680112ac282e85dc1b06b1b6c0bd	2021-10-12 10:49:36 -07:00
Teng Zhang	17e79bc76c	remove is_reference from all is_output_quantized (#66456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66456 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31562633 Pulled By: rahxephon89 fbshipit-source-id: 85c73a23e90ba9c1406f4027d447fbbe4576e39a	2021-10-12 10:43:52 -07:00
Jerry Zhang	702fb1de72	[fx2trt] open source tests for acc tracer (#66302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66302 Just move files, ossci can be setup later Test Plan: buck run //caffe2/test:test_fx_acc_tracer testinprod Reviewed By: 842974287 Differential Revision: D31495087 fbshipit-source-id: f182c7438e3e80ba98924990682cb45a99b9967c	2021-10-12 10:27:34 -07:00
Lu Fang	a6eec0c60f	Upgrade onnx submodule to 85546f8c44e627f8ff1181725d03cc49f675e44f (#66427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66427 Update the onnx submodule, so https://github.com/pytorch/pytorch/pull/66140 can land. Test Plan: ci Reviewed By: ezyang Differential Revision: D31544610 fbshipit-source-id: 94831ef531bbd654a6aeb744cd53a38155848079	2021-10-12 09:46:08 -07:00
Yinghai Lu	e6261083f9	[FX] fuse permute021 linear pass for trt lowering (#66362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66362 In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf. Test Plan: ``` buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048 OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45 OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15 ``` Unittest: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31525307 fbshipit-source-id: b472a8c277aa4d156d933d6a5abec091133f22c5	2021-10-12 09:41:32 -07:00
Ivan Yashchuk	8818dda237	Fix lstsq to work with inputs that require grad (#66426 ) Summary: I updated `sample_inputs_linalg_lstsq` and `test_nondifferentiable` now correctly reveals the failure. The internal assert error was thrown because autograd attempts to mark integer tensor as differentiable. Fixes https://github.com/pytorch/pytorch/issues/66420. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66426 Reviewed By: ejguan Differential Revision: D31550942 Pulled By: albanD fbshipit-source-id: 4a0ca60e62c5e9bb96af5020541da2d09ea3e405	2021-10-12 08:52:21 -07:00
Peter Bell	213ac4e59c	Remove native_functions.yaml dependency from PointwiseOps (#64172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64172 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728584 Pulled By: dagitses fbshipit-source-id: 2ae9686ac7c312e2d470d26a3cad12afcf7ef47b	2021-10-12 08:12:25 -07:00
Peter Bell	8674a3c6e3	Remove native_functions.yaml dependency from PowKernel (#64171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64171 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728583 Pulled By: dagitses fbshipit-source-id: ea6891a3598eead93daea620b94e50d3a3b248cf	2021-10-12 08:12:23 -07:00
Peter Bell	1841f76cc0	Remove native_functions.yaml dependency from unary ops (#64170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64170 Test Plan: Imported from OSS Reviewed By: gchanan, ezyang Differential Revision: D30728578 Pulled By: dagitses fbshipit-source-id: 70baa90d0834e68324504c74064a1d1790193483	2021-10-12 08:11:03 -07:00
Kevin Tse	71e17d9827	[DataPipe] Fix HttpReader IterDataPipe Issue with streaming (#66432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66432 This PR aims to fix the same issue that was addressed in TorchData. See this [TorchData PR](https://github.com/pytorch/data/pull/51) and the corresponding [issue](https://github.com/pytorch/data/issues/42) for details. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31547565 Pulled By: NivekT fbshipit-source-id: 1e0cb13be270e6b81a11af54fa08cf6d7e7c5721	2021-10-12 07:37:57 -07:00
Mikhail Zolotukhin	5f1518609b	[TensorExpr] Fix lowering for aten::t. (#65859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65859 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31289347 Pulled By: ZolotukhinM fbshipit-source-id: b9648416238657fe23366928e43ed63e992a8973	2021-10-12 01:26:36 -07:00
Mikhail Zolotukhin	6864146f2b	[TensorExpr] Fix lowerings for aten::view and aten::reshape. (#65852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65852 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31286024 Pulled By: ZolotukhinM fbshipit-source-id: eb5b5f2ed86b6f325f09904e841815b8183b4e1d	2021-10-12 01:26:34 -07:00
Mikhail Zolotukhin	60a2a295ce	[TensorExpr] Use schema instead of op name in NNC lowerings. (#65843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65843 Fixes #64963. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31282334 Pulled By: ZolotukhinM fbshipit-source-id: ffd0e1b6433d9360fedd9081c01ef41b21684439	2021-10-12 01:26:32 -07:00
Mikhail Zolotukhin	24b9b304d9	[TensorExpr] Nuke TE shape inference. (#65554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65554 We're relying on JIT based shape inference and not using the TE implementation. Question to the audience: we set `hasBroadcasts_` in that function, but this function was almost never invoked. Do we behave correctly in the presence of rand-calls and broadcasts? Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31148925 Pulled By: ZolotukhinM fbshipit-source-id: 2898a57e389ea0950163122089d0fec3d92701c4	2021-10-12 01:25:14 -07:00
Jacob Szwejbka	18e4688199	[Pytorch Edge] Improve bundled inputs name error handling (#65856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65856 Occasionally functions dont have this __name__ variable set and have name set instead? Not sure why this happens, but this should catch it. Test Plan: ci Reviewed By: iseeyuan Differential Revision: D31286787 fbshipit-source-id: 8a339541215329b6e9ff43ef77363be41f19c5ca	2021-10-12 00:08:39 -07:00
Michael Suo	2d1552824a	Revert D31386275: Migrate THCState to ATen Test Plan: revert-hammer Differential Revision: D31386275 (`a6774d6e1f`) Original commit changeset: 5c1f1bbe8c3d fbshipit-source-id: bea4e80fb0bdc57e8bb6a8ee781afd224adf4ed0	2021-10-11 22:30:08 -07:00
Mengwei Liu	d8532e3524	[PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445 `Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`. Test Plan: Rely on CI Reviewed By: iseeyuan Differential Revision: D31551432 fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147	2021-10-11 21:52:24 -07:00
Michael Suo	07ec250fd7	[deploy] fix oss build (#66347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66347 It turns out that our hard-coded build flavor that we were running deploy tests on in CI no longer exists lol. This PR fixes the OSS build and also updates the build flavor. Differential Revision: D31517679 D31517679 Test Plan: Imported from OSS Reviewed By: malfet, shunting314 Pulled By: suo fbshipit-source-id: 763f126a3304f82e6dff7cff8c56414d82c54de3	2021-10-11 21:48:26 -07:00
Xiang Gao	9a85167d22	Fix batch_isend_irecv tests for err case (#63112 ) Summary: - `batch_isend_irecv` returns a list of requests instead of a single request - remove some unused variables Pull Request resolved: https://github.com/pytorch/pytorch/pull/63112 Reviewed By: pbelevich, wayi1, fduwjj Differential Revision: D30921265 fbshipit-source-id: e2075925172805d33974ef0de6fb631bdf33b5ea	2021-10-11 19:39:49 -07:00
James Reed	3eb9443619	[FX] Fix issue where GraphModule.delete_all_unused_submodules deletes submodules from called leaf modules (#66430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66430 On the whole, I'm not totally satisfied with this approach. I think we should be building a prefix tree data structure during initial iteration over the submodules and querying that when deleting submodules. But I think this approach works and I want to see if we can get it in before 1.10 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D31546137 Pulled By: jamesr66a fbshipit-source-id: f08b8409a3cf511277017ccccb916097b7c4c4fe	2021-10-11 19:37:51 -07:00
Peter Bell	a6774d6e1f	Migrate THCState to ATen (#65948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65948 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386275 Pulled By: ngimel fbshipit-source-id: 5c1f1bbe8c3d2d9f5b99996e0588fb7f07fa6a77	2021-10-11 19:31:43 -07:00
Nikita Shulga	e7b5712c21	Call `PyArray_Check` only if NumPy is available (#66433 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66353 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433 Reviewed By: seemethere, janeyx99 Differential Revision: D31548290 Pulled By: malfet fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3	2021-10-11 19:25:31 -07:00
Vasiliy Kuznetsov	565cf47abf	Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66380 Description: 1. creates doc pages for Eager and FX numeric suites 2. adds a link from main quantization doc to (1) 3. formats docblocks in Eager NS to render well 4. adds example code and docblocks to FX numeric suite Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31543173 Pulled By: vkuzo fbshipit-source-id: feb291bcbe92747495f45165f738631fa5cbffbd	2021-10-11 18:47:58 -07:00
Vasiliy Kuznetsov	8b1258698e	Improve quantization API docs (#66379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379 Description: Creates a quantization API reference and fixes all the docblock errors. This is #66122 to #66210 squashed together Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, looks good ``` Reviewed By: ejguan Differential Revision: D31543172 Pulled By: vkuzo fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9	2021-10-11 18:46:11 -07:00
Eshika Shah	88ed93c2ca	Fix type checking errors in torch/quantization/fx/qconfig_utils.py (#66428 ) Summary: - [x] Fix the Pyre type checking errors in `torch/quantization/fx/qconfig_utils.py` ``` torch/quantization/fx/qconfig_utils.py:241:46 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/fx/qconfig_utils.py:267:46 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/fx/qconfig_utils.py:284:43 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/73](https://github.com/MLH-Fellowship/pyre-check/issues/73) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66428 Reviewed By: grievejia Differential Revision: D31545215 Pulled By: 0xedward fbshipit-source-id: 767ae7888854c2eec2ecf14855a5b011110b9271	2021-10-11 16:48:11 -07:00
Jerry Zhang	25965619dd	Back out "Revert D31495086: open source engine_layer_visualize.py" (#66431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66431 Original commit changeset: 186f3407a642 Test Plan: testinprod Reviewed By: 842974287 Differential Revision: D31546998 fbshipit-source-id: 4bc131d895cc4a7a84a4ff277df5f99e69ef4346	2021-10-11 16:06:23 -07:00
Nikita Shulga	ae5a9a451f	Do not enforce unused vars rule for torch_deploy (#66447 ) Summary: Followup after https://github.com/pytorch/pytorch/pull/66041 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66447 Reviewed By: seemethere Differential Revision: D31554356 Pulled By: malfet fbshipit-source-id: 6638324dcf658f4b244da285b4360ff2e2e2c013	2021-10-11 15:24:19 -07:00
Samuel Salas	7baf4f6b12	Chunk: Converter (#66028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66028 Added converter and unit test for torch.chunk function Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu Reviewed By: 842974287 Differential Revision: D31345180 fbshipit-source-id: 9425685671b474449e825aa2a8e7e867a329eb6e	2021-10-11 14:33:25 -07:00
Animesh Jain	cc24e4e5d0	[NNC] Normalize loops in SplitWithTail (#66242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242 While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31506853 Pulled By: anijain2305 fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1	2021-10-11 13:44:05 -07:00
Atul Jangra	49f1605392	[RFC] Reduce logging noise from AdagradOptimizer (#66443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66443 For some reason, this logging is adding noise to a lot of flow jobs. I am not sure if this is actually needed. This is called from the __init__ so it's logged all the time and logs all key:values the current local symbol. Test Plan: N/A Reviewed By: chowarfb Differential Revision: D31534372 fbshipit-source-id: bed032b66fed548c97a6f66b1b9e905fd2738851	2021-10-11 13:25:41 -07:00
Aliaksandr Ivanou	c03f851750	[torchelastic] Fix failing tests (#66440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66440 * Set correct name for test worker executable * Remove `test_get_override_executable` from oss, there already test that tests the functionality Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/launcher/fb:launch_test Reviewed By: d4l3k Differential Revision: D31544853 fbshipit-source-id: e1e009b4b38830d3a78981f8f93c2314ed851695	2021-10-11 13:06:36 -07:00
Animesh Jain	1d14fbdad7	[TensorExpr] Adding missing python binding for operators (#66336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66336 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31544865 Pulled By: anijain2305 fbshipit-source-id: 04be6cf079efc952d0f0b1e68f7f4da4a19c64fa	2021-10-11 12:47:41 -07:00
Richard Barnes	08fab7ae13	Wextra fix for Integration.cpp (#66321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66321 Fixes ``` stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (curr_shape.size() >= target_n_dim) ~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~ stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (curr_shape.size() >= target_n_dim) ~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505347 fbshipit-source-id: 100b76215f78c3ce75bf4a993715a6767189747d	2021-10-11 12:30:46 -07:00
Scott Wolchok	8c468ce00b	[PyTorch][JIT] Return a reference from caching specializations of getTypePtr (#66342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66342 `decltype(auto)` in D31486117 (`fb5a80ffd8`) wasn't the right choice in these specializations, because it will still deduce a copy. See https://godbolt.org/z/GjbcPE1c4 for example. ghstack-source-id: 140144199 Test Plan: CI, added new static_assert to make sure we got it right for std::tuple in particular Reviewed By: hlu1, JasonHanwen Differential Revision: D31514960 fbshipit-source-id: cae722aa34345b590c46eae478229cb5f4b0d7dc	2021-10-11 12:17:50 -07:00
Scott Wolchok	998cb98844	[PyTorch][jit] Cache TupleType objects in getTypePtr (#66340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66340 For functions that take `std::vector`s with `std::tuple`s in them, `getTypePtr` can get hit on every call, in which case creating a new `TupleType` object every time is expensive. ghstack-source-id: 140143104 Test Plan: CI Reviewed By: hlu1, JasonHanwen Differential Revision: D31514792 fbshipit-source-id: 23652ca90ba1259afc05e953b99ce1fe1bebcc2b	2021-10-11 12:16:31 -07:00
Nikita Shulga	acb0157a3d	Specialization for `c10::util:get_type_index<std::string>` (#66290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66290 Add full specialization for std::string type index It slightly speeds up compilation as well as solves the ambiguity how template instantiations implemented in inline namespaces are rendered during `__PRETTY_FUNCTION__` computation. Not sure what `#pragma` controls this behaviour, but when code is compiled by clang-12+ using libstdc++, `__PRETTY_PRINT__`, sometimes resolve `std::string` to `std::basic_string<char>` and sometimes to `std::__cxx11::basic_string<char>`, even though in the object file symbol is always inside `std::__cxx11::` namespace, which might break caffe2 serialization code that depends on dynamic hash generation Template name resolution were debugged using https://gist.github.com/malfet/c83b9ebd35730ebf8bac7af42682ea37 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: r-barnes Differential Revision: D31490050 fbshipit-source-id: 127091574cf6b92c7ec3f972821e4e76f5f626a9	2021-10-11 11:11:59 -07:00
Rohan Varma	901df0cc22	Skip test_nccl_errors_nonblocking (#66394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66394 Skips this test as it currently does not seem to pass after several internal local runs. ghstack-source-id: 140210583 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534806 fbshipit-source-id: 799849a6a715506a85c9697b46f7098d9b71b32e	2021-10-11 10:08:31 -07:00
Richard Barnes	221c308389	Wextra fix for LossCTC.cpp (#66381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66381 Fixes ``` stderr: caffe2/aten/src/ATen/native/cudnn/LossCTC.cpp:83:37: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const long' [-Werror,-Wsign-compare] TORCH_CHECK(input_lengths_.size() == batch_size, "input_lengths needs to have size to match batch_size"); ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31510217 fbshipit-source-id: e3585e08650950c08d80d347dfae375aedf2ceaf	2021-10-11 10:02:53 -07:00
Don Jang	736fa09a9a	[Static Runtime] Manage output tensors (#65515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65515 This change enables `StaticRuntime` to manage output tensors (returned from a graph) as follows: - At the creation of `StaticModule`, it gathers a set of candidates for output tensors (& their aliases) for managing. This is done by `ValueGroup` introduced by the previous diff. - At the end of the 1st iteration, `MemoryPlanner` creates a set of output `at::Tensor` to manage. This set consists of tensors objects from the aforementioned candidates, excluding the direct output value of the graph to simplify ivalue ownership passing (`std::move(ivalue)` to return from SR). Note that this exclusion has no perf implication for inline_cvr & ctr_mobilefeed since they only return a container object (e.g., tuple). - The 2nd+ iterations preallocates a slab memory and all identified output tensors during the 1st iteration. Note that these preallocated tensors are NOT* deallocated when returned from SR. The client receives the output tensors, and completes using them, and is responsible to call `StaticRuntime::deallocateOutputTensors()` to deallocate them. This mandates that SR cannot be reentered until `deallocateOutputTensors` is called by the client. - In case of a buggy client missing a call to `StaticRuntime::deallocateOutputTensors()`, SR throws an exception when reentered instead of leaking memory. - Nit: I plan to use camlcase for function names, and so all newly introduced functions use camlcase despite inconsistencies with snakecase. We can gradually fix the inconsistencies. This change will be followed by another one to enable `manage_output_tensors` from `PyTorchScriptPredictor`, starting with `ptvsc2_prediction_bench` as a testbed. Test Plan: - Added `StaticRuntime.ManageOutputTensors*` to cover the newly added code paths. - Enhanced `testStaticRuntime` to exercise each unittest test case with `manage_output_tensors` on. Confirmed that SR actually managed output tensors successfully for a few existing testcases (e.g., StaticRuntime.EmbeddingBag`). Reviewed By: hlu1 Differential Revision: D31049221 fbshipit-source-id: 4ad1599179cc7f00d29e0ce41b33f776226d4383	2021-10-11 09:50:54 -07:00
Eli Uriegas	3b4b1b2d23	.github: Remove confusing ciflow_config.enabled variable (#66260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66260 Every workflow has ciflow enabled so this is not needed anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: dagitses, janeyx99 Differential Revision: D31493340 Pulled By: seemethere fbshipit-source-id: 8718fe5d22f4be6e0900962576782a9f23162a39	2021-10-11 09:39:31 -07:00
Peter Bell	c66847afbe	Add workaround for nvcc header dependecies bug (#62550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62550 I noticed that running the build twice in a row resulted in ~80 CUDA files being rebuilt. Running `ninja -d explain` shows ``` ninja explain: TH/generic/THStorage.h is dirty ninja explain: TH/generic/THStorageCopy.h is dirty ninja explain: THC/generic/THCStorage.h is dirty ninja explain: THC/generic/THCStorageCopy.h is dirty ninja explain: TH/generic/THTensor.h is dirty ninja explain: THC/generic/THCTensor.h is dirty ninja explain: THC/generic/THCTensorCopy.h is dirty ninja explain: THC/generic/THCTensorMath.h is dirty ninja explain: THC/generic/THCTensorMathMagma.h is dirty ninja explain: THC/generic/THCTensorMathPairwise.h is dirty ninja explain: THC/generic/THCTensorScatterGather.h is dirty ``` considering `ninja` is working relative to the `build` folder, these files don't actually exist. I traced this back to the output of `nvcc -MD` containing paths relative to the include directory, instead of being absolute. This adds a little script to launch the compiler then resolve any relative paths in the `.d` file before `ninja` looks at it. To use it, I run the build with ``` export CMAKE_CUDA_COMPILER_LAUNCHER="python;`pwd`/tools/nvcc_fix_deps.py;ccache" ``` There are some possible pit-falls here. The same relative path might work for two include directories, and the compiler could pick a different one. Or, the compiler might have additional implicit include directories that are needed to resolve the path. However, this has worked perfectly in my testing and it's completely opt-in so should be fine. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503351 Pulled By: malfet fbshipit-source-id: b184c4526679d976b93829b5715cafcb1c7db2ae	2021-10-11 09:07:12 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Alban Desmaison	d3b29afbb6	Remove old code that is unused in test/ (#66331 ) Summary: . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66331 Reviewed By: gchanan Differential Revision: D31533549 Pulled By: albanD fbshipit-source-id: 5addd11edc4199a88f10f0ff236be59ec2289903	2021-10-11 08:45:24 -07:00
Nikita Shulga	4775419850	[BE] Address feedback from #66296 (#66315 ) Summary: Also use range loop instead of regular one Pull Request resolved: https://github.com/pytorch/pytorch/pull/66315 Reviewed By: albanD Differential Revision: D31503730 Pulled By: malfet fbshipit-source-id: f5568f7f28e15a9becd27986dd061a6fcae34651	2021-10-11 08:39:29 -07:00
XiaobingSuper	822c0850cb	fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype (#66396 ) Summary: There has an issue when calling torch.get_autocast_cpu_dtype and torch.get_autocast_gpu_dtype: ``` >>> torch.get_autocast_gpu_dtype()==torch.half False >>> torch.get_autocast_cpu_dtype()==torch.bfloat16 False ``` but the expected results should be : ``` >>> torch.get_autocast_gpu_dtype()==torch.half True >>> torch.get_autocast_cpu_dtype()==torch.bfloat16 True ``` This PR is about fixing this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66396 Reviewed By: ejguan Differential Revision: D31541727 Pulled By: albanD fbshipit-source-id: 1a0fe070a82590ef2926a517bf48046c2633d168	2021-10-11 08:34:48 -07:00
Nikita Vedeneev	1b40daac74	pinv: forward/backward AD which is Frechet-defined in a rank-preserving neighborhood. (#66092 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65911. Also enables complex support/tests for `linalg_pinv` in OpInfo. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66092 Reviewed By: ejguan Differential Revision: D31503072 Pulled By: albanD fbshipit-source-id: 52018e826826ae62beaad76becb5edf880be253f	2021-10-11 08:33:28 -07:00
Jane Xu	7c2f53b363	[BE] set pretrained=False for onnx tests (#66312 ) Summary: Addresses this network risk mitigation mentioned in https://github.com/pytorch/pytorch/issues/65439#issuecomment-924627239. I didn't include any mobile app/benchmarking changes because I think the pretrained matters there. I ended up removing the changes in test_utils because those were sensitive to the pretrained variable. I am saving the quantization test changes for another PR because they are currently disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66312 Reviewed By: ejguan Differential Revision: D31542992 Pulled By: janeyx99 fbshipit-source-id: 57b4f70247af25cc96c57abd9e689c34641672ff	2021-10-11 08:29:11 -07:00
Vasiliy Kuznetsov	1d9a6862cd	fx quant: add a BC test for loading old torch.package models (#65538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65538 Adds a test which verifies that `prepare_fx` and `convert_fx` work on models created by `torch.package` in the past. In detail: 1. (one time) create a model and save it with torch.package. Also save input, expected output, and names of quantization related get_attrs added by our passes. 2. (every time) load the model from (1), and verify that expected output matches current output, and that get_attr targets did not change. Test Plan: ``` python test/test_quantization.py TestSerialization.test_linear_relu_package_quantization_transforms ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31512939 fbshipit-source-id: 718ad5fb66e09b6b31796ebe0dc698186e9a659f	2021-10-11 08:23:38 -07:00
Hong Xu	0348148725	Update link to qnnpack in quantization doc. (#66226 ) Summary: The old repo has been archived. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66226 Reviewed By: vkuzo Differential Revision: D31534712 Pulled By: ezyang fbshipit-source-id: 4d7f070c8547aeb25464c72b25ed21f209821bc2	2021-10-11 08:19:19 -07:00
Shen Li	58fefa6516	Add pybind trampoline for ProcessGroup and Work (#66338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66338 This commit exposes c10d extension API to Python land. Users can now override c10d communication behaviors in pure Python, and no longer needs to go through the cpp extension steps. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31514351 Pulled By: mrshenli fbshipit-source-id: a8b94af0af7960c078e1006c29b25f7f3bd86c81	2021-10-11 06:41:06 -07:00
Luca Wehrstedt	bc06eefebe	[reland] Allow external CUDA streams to be set as current (#66324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66324 Fixes https://github.com/pytorch/pytorch/issues/65822. Reland of https://github.com/pytorch/pytorch/pull/65914. ghstack-source-id: 140105651 Test Plan: Added tests Reviewed By: ngimel Differential Revision: D31506134 fbshipit-source-id: ff56203a120befdb282e974309478ac11aa56652	2021-10-11 02:41:43 -07:00
Chen Lai	355acfdebc	[PyTorch Edge][tracing-based] use operator.yaml to build libtorch library (#66237 ) Summary: https://pxl.cl/1QK3N Enable using the yaml file from tracer to build libtorch library for ios and android. 1. Android: ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh x86 ``` libtorch_lite.so x86: 3 MB (larger than H1, static is ~3.2 MB) 2. iOS ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh ``` Binary size: 7.6 MB Size: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66237 ghstack-source-id: 140197164 Reviewed By: dhruvbird Differential Revision: D31463119 fbshipit-source-id: c3f4eb71bdef1969eab6cb60999fec8547641cbd	2021-10-10 14:07:01 -07:00
Mike Ruberry	9971113340	Revert D31447612: Create a documentation page for FX graph mode quantization APIs Test Plan: revert-hammer Differential Revision: D31447612 (`a89ac3138e`) Original commit changeset: 07d0a6137f15 fbshipit-source-id: f2cba7d835011500580b4ab9cff72171280ee18b	2021-10-10 01:51:13 -07:00
Mike Ruberry	b85fd4c54f	Revert D31447613: Create separate documentation pages for quantization observers and fake_quants Test Plan: revert-hammer Differential Revision: D31447613 (`f0fa3d1110`) Original commit changeset: 63b4cf518bad fbshipit-source-id: 67de592d1e12a5149cdb22b0725caad063f94476	2021-10-10 01:51:11 -07:00
Mike Ruberry	10633460ce	Revert D31447614: Create a documentation page for `torch.ao.quantization.QConfig` Test Plan: revert-hammer Differential Revision: D31447614 (`7332ed13ed`) Original commit changeset: 5d9dd2a4e864 fbshipit-source-id: 6ac15a956222ca61f7fbb75ed36bcc58b23f0f36	2021-10-10 01:51:09 -07:00
Mike Ruberry	037ac2330e	Revert D31447616: Quantization docs: consilidate all API references on a single page Test Plan: revert-hammer Differential Revision: D31447616 (`fe86f0e068`) Original commit changeset: 2f9c4dac2b2f fbshipit-source-id: 673368e87399f0a25441688bb9356de5a2f3e66e	2021-10-10 01:51:07 -07:00
Mike Ruberry	09c3e6002b	Revert D31447615: Quantization docs: rewrite API reference to be more automated Test Plan: revert-hammer Differential Revision: D31447615 (`7d2526ab20`) Original commit changeset: 09874ad9629f fbshipit-source-id: 0963c9f5118e243cd299f8cded2bf7b0848a7105	2021-10-10 01:51:05 -07:00
Mike Ruberry	df1858bea5	Revert D31447611: Quantization documentation: move backend section down Test Plan: revert-hammer Differential Revision: D31447611 (`309a8cf46c`) Original commit changeset: 537b146559bc fbshipit-source-id: c400aef9a2ea5d18f8076879fe6354be7a6732f1	2021-10-10 01:51:03 -07:00
Mike Ruberry	ad0accdecd	Revert D31447610: Quantization docs: add pages for Numeric Suite (Eager and FX) Test Plan: revert-hammer Differential Revision: D31447610 (`9539e6216b`) Original commit changeset: 441170c4a6c3 fbshipit-source-id: b49bff54405cdb8465397077e38506a36b277921	2021-10-10 01:49:19 -07:00
Mike Ruberry	291d463cf9	Revert D31495086: open source engine_layer_visualize.py Test Plan: revert-hammer Differential Revision: D31495086 (`150b7c7410`) Original commit changeset: 1f5505d6baac fbshipit-source-id: 186f3407a6423f0981f0b7a2e7408ce53013fceb	2021-10-10 01:45:21 -07:00
John Shen	0e0c98077f	[quantized] Implement 3d convolution in qnnpack (#66350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66350 Implements conv3d for QNNPACK by writing another kernel for the indirection buffer in 3 dimensions. Modifies all structs to take depth, with depth = 1 indicating 2d operation. gemm and conv (non transpose) work, next up is depthwise and tranpose. ghstack-source-id: 140152440 Test Plan: test/quantization Reviewed By: kimishpatel Differential Revision: D30858693 fbshipit-source-id: 883cca8ec53b9e15ab4b9473c6cc042e3d049d9c	2021-10-09 12:28:24 -07:00
Jerry Zhang	150b7c7410	open source engine_layer_visualize.py (#66301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66301 Test Plan: testinprod Reviewed By: 842974287 Differential Revision: D31495086 fbshipit-source-id: 1f5505d6baac66eca11a35ce9532d6c7c7513190	2021-10-09 10:25:03 -07:00
Facebook Community Bot	27f193af64	Automated submodule update: kineto (#59674 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `6f9c0eeff5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59674 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: larryliu0820 Differential Revision: D28977762 fbshipit-source-id: d441d4d46a7044cc05eb8b21e59471deee312e02	2021-10-09 09:34:32 -07:00
Peter Bell	84326ef059	Remove native_functions.yaml dependency from binary ops (#64169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64169 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728586 Pulled By: dagitses fbshipit-source-id: 17d645b6712815d1967b9ff83eecc4d16833ee6b	2021-10-09 09:25:48 -07:00
Vasiliy Kuznetsov	9539e6216b	Quantization docs: add pages for Numeric Suite (Eager and FX) (#66222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66222 Description: 1. creates doc pages for Eager and FX numeric suites 2. adds a link from main quantization doc to (1) 3. formats docblocks in Eager NS to render well 4. adds example code and docblocks to FX numeric suite Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447610 Pulled By: vkuzo fbshipit-source-id: 441170c4a6c3ddea1e7c7c5cc2f1e1cd5aa65f2f	2021-10-09 06:46:06 -07:00
Vasiliy Kuznetsov	309a8cf46c	Quantization documentation: move backend section down (#66210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66210 Description: Moves the backend section of the quantization page further down, to ensure that the API description and reference sections are closer to the top. Test Plan: ``` cd docs make html python -m server.http // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447611 Pulled By: vkuzo fbshipit-source-id: 537b146559bce484588b3c78e6b0cdb4c274e8dd	2021-10-09 06:46:04 -07:00
Vasiliy Kuznetsov	7d2526ab20	Quantization docs: rewrite API reference to be more automated (#66201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66201 Description: This PR switches the quantization API reference to use `autosummary` for each section. We define the sections and manually write a list of modules/functions/methods to include, and sphinx does the rest. A result is a single page where we have every quantization function and module with a quick autogenerated blurb, and user can click through to each of them for a full documentation page. This mimics how the `torch.nn` and `torch.nn.functional` doc pages are set up. In detail, for each section before this PR: * creates a new section using `autosummary` * adds all modules/functions/methods which were previously in the manual section * adds any additional modules/functions/methods which are public facing but not previously documented * deletes the old manual summary and all links to it Test Plan: ``` cd docs make html python -m http.server // renders well, links work ``` Reviewed By: jerryzh168 Differential Revision: D31447615 Pulled By: vkuzo fbshipit-source-id: 09874ad9629f9c00eeab79c406579c6abd974901	2021-10-09 06:46:02 -07:00
Vasiliy Kuznetsov	fe86f0e068	Quantization docs: consilidate all API references on a single page (#66198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66198 Consolidates all API reference material for quantization on a single page, to reduce duplication of information. Future PRs will improve the API reference page itself. Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447616 Pulled By: vkuzo fbshipit-source-id: 2f9c4dac2b2fb377568332aef79531d1f784444a	2021-10-09 06:46:00 -07:00
Vasiliy Kuznetsov	7332ed13ed	Create a documentation page for `torch.ao.quantization.QConfig` (#66129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66129 Adds a documentation page for `torch.ao.quantization.QConfig`. It is useful for this to have a separate page since it shared between Eager and FX graph mode quantization. Also, ensures that all important functions and module attributes in this module have docstrings, so users can discover these without reading the source code. Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, renders correctly ``` Reviewed By: jerryzh168 Differential Revision: D31447614 Pulled By: vkuzo fbshipit-source-id: 5d9dd2a4e8647fa17b96cefbaae5299adede619c	2021-10-09 06:45:58 -07:00
Vasiliy Kuznetsov	f0fa3d1110	Create separate documentation pages for quantization observers and fake_quants (#66125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125 Before this PR, the documentation for observers and fake_quants was inlined in the Eager mode quantization page. This was hard to discover, especially since that page is really long, and we now have FX graph mode quantization reusing all of this code. This PR moves observers and fake_quants into their own documentation pages. It also adds docstrings to all user facing module attributes such as the default observers and fake_quants, so people can discover them from documentation without having to inspect the source code. For now, enables autoformatting (which means all public classes, functions, members with docstrings will get docs). If we need to exclude something in these files from docs in the future, we can go back to manual docs. Test Plan: ``` cd docs make html python -m server.http // inspect docs on localhost, renders correctly ``` Reviewed By: dagitses Differential Revision: D31447613 Pulled By: vkuzo fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599	2021-10-09 06:45:56 -07:00
Vasiliy Kuznetsov	a89ac3138e	Create a documentation page for FX graph mode quantization APIs (#66122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66122 Description: Adds a documentation page for FX graph mode quantization APIs which reads from the docstrings in `quantize_fx`, and links it from the main quantization documentation page. Also, updates the docstrings in `quantize_fx` to render well with reStructuredText. Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, looks good ``` Reviewed By: dagitses Differential Revision: D31447612 Pulled By: vkuzo fbshipit-source-id: 07d0a6137f1537af82dce0a729f9617efaa714a0	2021-10-09 06:44:38 -07:00
CodemodService FBSourceClangFormatLinterBot	b96c7aea73	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31527108 fbshipit-source-id: 40360ebf92e67fd95613cedea9988fbe52519de6	2021-10-09 06:03:49 -07:00
Richard Barnes	109aa135e6	Remove apparently unnecessary std::remove_cv_t (#66254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66254 `std::decay_t` already implies dropping the const Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31465856 fbshipit-source-id: 851cdb9194354fe9a89b3a37a4463a43dbbcd77a	2021-10-09 00:38:44 -07:00
Richard Barnes	4cb4d11e0b	Disable "-Wignored-qualifiers" for vec256_bfloat16.h (#66279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66279 This error appears when compiling with "-Wextra" and cannot be resolved by fixing the code since the return type of the instrinic being passed to `map` is fixed. Fixes: ``` caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] Vectorized<BFloat16> map(const __m256 (const vop)(__m256)) const { ^~~~~~ caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] Vectorized<BFloat16> map(const __m256 (const vop)(__m256)) const { ^~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31480888 fbshipit-source-id: 919c0d48c8ce13ce1106a9df124a077945e36707	2021-10-08 21:47:41 -07:00
Chen Lai	3fe5895a00	Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267 ) Summary: Previously https://github.com/pytorch/pytorch/pull/64087 broke the test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267 Original commit changeset: 8ac3d75a52d0 ghstack-source-id: 140122106 Test Plan: binary_macos_wheel_3_7_cpu_build passes {F668643831} Reviewed By: dhruvbird Differential Revision: D31478593 fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161	2021-10-08 20:12:12 -07:00
Scott Wolchok	1763c25414	[PyTorch][jit] Fix excess refcounting in TupleType::compare (#66286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66286 No need to take refcount bumps on each comparator call. Test Plan: CI, review Reviewed By: hlu1, JasonHanwen Differential Revision: D31487058 fbshipit-source-id: 98d2447ac27a12695cb0ebe1e279a6b50744ff4f	2021-10-08 20:08:07 -07:00
Scott Wolchok	fb5a80ffd8	[jit] Don't force refcount bumps from getTypePtr (#66282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66282 Now that a bunch of the `FooType::get()` functions return a const reference, we can forward that behavior through `getTypePtr()` using return type deduction. Test Plan: Inspect assembly for List_test.cpp before/after the rest of the change; reference counting is no longer in the happy path. Reviewed By: hlu1, JasonHanwen Differential Revision: D31486117 fbshipit-source-id: 863b677bb6685452a5b325d327bdc2a0a09627bf	2021-10-08 20:06:43 -07:00
Eshika Shah	85b562dd2b	Fix type checking errors in fx/utils.py (#66311 ) Summary: - [x] Fix the Pyre type checking errors in `torch/quantization/fx/utils.py` ``` torch/quantization/fx/utils.py:490:4 Incompatible variable type [9]: target_module_type is declared to have type `Type[nn.modules.module.Module]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/75](https://github.com/MLH-Fellowship/pyre-check/issues/75) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66311 Reviewed By: pradeep90 Differential Revision: D31506399 Pulled By: 0xedward fbshipit-source-id: 3d866fba6005452378d4a2613b8689fa2d7a8b67	2021-10-08 19:14:22 -07:00
Shiyan Deng	e5f6f356da	[hpc infer] fix bench perf number Reviewed By: yinghai, jianyuh Differential Revision: D31505288 fbshipit-source-id: e4951a7c5813e0ee38903dec4cef61531f1b4059	2021-10-08 19:11:04 -07:00
Jane Xu	904fbadaff	Fix merge conflict in bc tests (#66356 ) Summary: BC test currently borken on trunk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66356 Reviewed By: malfet Differential Revision: D31523340 Pulled By: janeyx99 fbshipit-source-id: a8d1ff697f017c710f70a76b5bb6a2f89d7637c7	2021-10-08 18:45:15 -07:00
Scott Wolchok	5a67ffe0ad	[PyTorch][Static Runtime] Combine ProcessedNode::{native_,}fn_ (#65414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65414 Saves 24 bytes (`sizeof(std::function) - 8`) per ProcessedNode. ghstack-source-id: 139999909 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31085561 fbshipit-source-id: 70734b8319e805736ba41aedaaf7fa3d463400c9	2021-10-08 18:11:59 -07:00
Vasiliy Kuznetsov	566922bbcd	clean up mypy nit in torch/jit/_recursive.py (#66253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66253 This was initially broken in #65829 and unbroken in #66003, this PR cleans it up by removing the mypy ignore line. Test Plan: ``` mypy torch/jit/_recursive.py --no-incremental ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31475100 fbshipit-source-id: 46ab2ede72c08b926f4f9a6b03b1a1375b884c8a	2021-10-08 18:07:33 -07:00
Richard Barnes	4a302a3074	Wextra fix for CUDAApplyUtils.cuh (#66323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66323 Fixes ``` /data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:310:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare] const IndexType bOffset = sizeof...(Offsets) < n ? ~~~~~~~~~~~~~~~~~~ ^ ~ /data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:306:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare] const IndexType aOffset = sizeof...(Offsets) < n ? ~~~~~~~~~~~~~~~~~~ ^ ~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505428 fbshipit-source-id: 326fa8f41f2b200981eddc5cab035b18536cd24e	2021-10-08 18:02:09 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Jane Xu	c62ed96496	Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source Test Plan: revert-hammer Differential Revision: D30710710 (`c1343ff706`) Original commit changeset: 51399f9b0b64 fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540	2021-10-08 17:46:18 -07:00
Peter Bell	c957d9fdf6	Replace _baddbmm_mkl_ with cpublas::gemm_batched (#66165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66165 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31493952 Pulled By: ngimel fbshipit-source-id: 87cf79036c2d0f4955edbeeeb78f578b0fd223ab	2021-10-08 17:12:14 -07:00
Richard Barnes	51835bec07	Wextra fix 1 for caffe2 (#66272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66272 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31475543 fbshipit-source-id: f6e02d299d0b792ddb37534ad85db82af65bb42a	2021-10-08 16:36:13 -07:00
Zafar Takhirov	a28b038af4	[ao_migration] torch/nn/intrinsic: torch.quantization -> torch.ao.quantization (#65903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65903 This changes the imports in the `caffe2/torch/nn/intrinsic` to include the new import locations. ``` codemod -d torch/nn/intrinsic --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: albanD Differential Revision: D31301195 fbshipit-source-id: a5a9d84cb1ac33df6c90ee03cda3e2f1c5d5ff51	2021-10-08 16:21:23 -07:00
Zafar Takhirov	2daae532bd	[ao_migration] torch/nn/qat: torch.quantization -> torch.ao.quantization (#65902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65902 This changes the imports in the `caffe2/torch/nn/qat` to include the new import locations. ``` codemod -d torch/nn/qat --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301196 fbshipit-source-id: ff237790d74cd3b3b5be642a997810f4f439a1d8	2021-10-08 16:21:21 -07:00
Zafar Takhirov	1a6482ee2a	[ao_migration] torch/nn/quantizable: torch.quantization -> torch.ao.quantization (#65901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65901 This changes the imports in the `caffe2/torch/nn/quantizable` to include the new import locations. ``` codemod -d torch/nn/quantizable --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301194 fbshipit-source-id: 8ce8a3015ea61da62d7658846d1ca64fbdabaf7a	2021-10-08 16:21:19 -07:00
Zafar Takhirov	b23709df03	[ao_migration] torch/nn/quantized: torch.quantization -> torch.ao.quantization (#65900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900 This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations. ``` codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301193 fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f	2021-10-08 16:19:53 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Kimish Patel	c1343ff706	[Pytorch Edge] Support profiling kineto events from external source (#64397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397 This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Imported from OSS Reviewed By: raziel Differential Revision: D30710710 fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2	2021-10-08 15:59:42 -07:00
Richard Barnes	8a02d3e5d0	Wextra fix for Tensorshape.cpp (#66320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66320 Fixes ``` stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare] for (size_t offset = 0; offset < numel; offset++) { ~~~~~~ ^ ~~~~~ stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare] for (size_t offset = 0; offset < numel; offset++) { ~~~~~~ ^ ~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505374 fbshipit-source-id: 0fc393dacd72a8b29c0d82561f730cc047b38f0c	2021-10-08 15:03:47 -07:00
Peter Bell	731cf494f2	Remove cuda/Loops.cuh dependency on native_functions.yaml (#64168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64168 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728582 Pulled By: dagitses fbshipit-source-id: 99dcbb9bb790dd0440d498593ac43e2c18e54a0c	2021-10-08 12:58:52 -07:00
Raghavan Raman	92ce188510	Revert D31445799: [nnc] Use given kernel function name while emitting code Test Plan: revert-hammer Differential Revision: D31445799 (`c30dc52739`) Original commit changeset: 8d1642098313 fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5	2021-10-08 12:39:01 -07:00
Raghavan Raman	2e6fa0261f	Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination Test Plan: revert-hammer Differential Revision: D31445797 (`7e5ef5e517`) Original commit changeset: 4e1450100928 fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39	2021-10-08 12:38:59 -07:00
Raghavan Raman	097fdcdf0c	Revert D31445798: [Static Runtime] Cleanup LLVMCodeGen memory after code gen completes Test Plan: revert-hammer Differential Revision: D31445798 (`40dd2711b6`) Original commit changeset: c860d36456b2 fbshipit-source-id: 64d900cad87113e6b871aedd6669e771a7ede5cc	2021-10-08 12:37:48 -07:00
Peter Bell	0be36d798b	Remove Tensor.h include from TensorIterator.h (#64167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64167 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30728579 Pulled By: dagitses fbshipit-source-id: 3888da00c9c8030013c8f4b39d300fe671defb05	2021-10-08 12:28:37 -07:00
Peter Bell	bc1dec9b81	Migrate THCStorage_resizeBytes to ATen (#65944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65944 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386276 Pulled By: ngimel fbshipit-source-id: a2b28bc09d11a856fdd3796d3df6f96613f13437	2021-10-08 11:50:52 -07:00
John Clow	3bad54069b	Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198 Linear layers using the same input tensor can be concatted together as long as the weights and biases are compatible. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31240642 fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e	2021-10-08 10:55:46 -07:00
Scott Wolchok	94845fc44e	[jit] Implement one-argument AliasDb::mayContainAlias more efficiently (#65177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65177 There is no need to heap-allocate any vectors in this case. ghstack-source-id: 140052520 Test Plan: CI Startup for static runtime on ctr_mobile_feed local net decreased from 7.8s to about 7.0s Reviewed By: malfet Differential Revision: D30984194 fbshipit-source-id: 85091e55445f653ec728b27da4b459a2f1873013	2021-10-08 10:29:25 -07:00
Scott Wolchok	c80693f7e6	[jit] Add cache for MemoryDAG::collectAllContainedMemoryLocations (#65122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65122 Failure to cache this seems to contribute to quadratic startup time for the static runtime. Disclaimer: I am entirely un-versed in the performance considerations for the JIT and have no idea what the other impacts of this change may be. Let the reviewer beware. ghstack-source-id: 140052522 Reviewed By: suo Differential Revision: D30983268 fbshipit-source-id: 4329aee6b5781f5c2e2d2334c396fab8528d4b7b	2021-10-08 10:29:23 -07:00
Scott Wolchok	3ef69a4598	[static runtime] Pre-allocate hash tables (#65343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65343 No reason not to save a bit on re-hashing. ghstack-source-id: 140052518 Test Plan: CI Static runtime startup seems to go from 5.9-6.0s to 5.8s-6.0s, perf shows less time spent rehashing Reviewed By: mikeiovine Differential Revision: D31027362 fbshipit-source-id: 39dd53ecd462693b518535856ddd92df78a4977b	2021-10-08 10:28:13 -07:00
Peter Bell	0020a151c6	slow_conv3d grad_weight: call gemm directly (#65759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65759 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31257873 Pulled By: ngimel fbshipit-source-id: 1612c0be10b2aa269c807c7b9f5470172ed68dc1	2021-10-08 09:55:08 -07:00
Yanli Zhao	dfb64b3287	log API usage for fsdp API in PyTorch (#64964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64964 log API usage for fsdp API in PyTorch Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D30915734 fbshipit-source-id: 5e3b335327f4a3ff59b025e8e17a0fa0b7f6597d	2021-10-08 09:32:59 -07:00
Luca Wehrstedt	201174cb91	Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current Test Plan: revert-hammer Differential Revision: D31389480 (`61f0bb70c1`) Original commit changeset: 2b2f40e5452c fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d	2021-10-08 09:20:24 -07:00
Rohan Varma	b72a1782d8	[PG Wrapper][BE] Add collective information when monitored barrier error is (#66167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66167 Sometimes due to desync we see PG wrapper monitored barrier fail. In this case it would be useful to print the info about the collective that was trying to run along with the actual error. ghstack-source-id: 140037653 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353021 fbshipit-source-id: e2a515326c9314c98119978d5566eb5431cca96c	2021-10-08 09:14:24 -07:00
Rohan Varma	b5b1d49a66	[PG Wrapper][BE] Make some methods private (#66166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66166 These methods should be private. ghstack-source-id: 139782587 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353020 fbshipit-source-id: 583fb315cc2cacc37df3d29cd5793b42558930b3	2021-10-08 09:13:02 -07:00
Peter Bell	0cad2c0615	Move intraop_launch_future from Parallel.h (#64166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728585 Pulled By: dagitses fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee	2021-10-08 09:07:35 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Scott Wolchok	1ae468a484	[jit] Refcounting spot fixes (#65346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346 Tidying up the top sources of reference count decrements seen during static runtime startup. ghstack-source-id: 140027349 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31057277 fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da	2021-10-08 08:39:20 -07:00
Kevin Tse	8ebe1a924d	[DataPipe] moving mux IterDataPipe test to the right location (#66277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66277 Previously, it is grouped together with tests related to `MapDataPipe`, but it should be with `IterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485823 Pulled By: NivekT fbshipit-source-id: d13d8c28cbfc305da0e3033d4109a0f971281a02	2021-10-08 08:32:29 -07:00
Kevin Tse	ed17851642	[DataPipe] adding test for IterableWrapperIterDataPipe (#66276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66276 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485824 Pulled By: NivekT fbshipit-source-id: c7b21636e4b17e264bfb5dbea69cd3c477472f0b	2021-10-08 08:32:26 -07:00
Kevin Tse	e808e3d3d6	[DataPipe] adding SequenceWrapperMapDataPipe (#66275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66275 Once this is added to Core, TorchData's PR will not need a custom class and can use this wrapper instead. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485822 Pulled By: NivekT fbshipit-source-id: 790de27629c89c0ca7163a8ee5a09ee8b8233340	2021-10-08 08:32:24 -07:00
Vasiliy Kuznetsov	a7cc07f109	quantized embedding: make error message clearer (#66051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66051 Make the error message clearer when quantized embedding is converted with an unsupported dtype. This is helpful when debugging quantization errors on new models. Test Plan: ``` class M(nn.Module): def __init__(self): super().__init__() self.embedding = nn.Embedding(1, 1) m = M().eval() m.qconfig = torch.quantization.QConfig( activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8), weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8)) m.embedding.qconfig = m.qconfig mp = torch.quantization.prepare(m) mq = torch.quantization.convert(m) // error message now includes the incorrect dtype ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472848 fbshipit-source-id: 86f6d90bc0ad611aa9d1bdae24497bc6f3d2acaa	2021-10-08 08:32:22 -07:00
Vasiliy Kuznetsov	c9aba3b128	make error message when trying to quantize non floats more specific (#66050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66050 Adds the dtype to an error message when trying to quantize something other than a float. This is useful for debugging quantization tools on new models. Test Plan: ``` x = torch.randn(1, 1, 1, 1, dtype=torch.double) xq = torch.quantize_per_tensor(x, 0.01, 0, torch.quint8) // error message now includes Double ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472849 fbshipit-source-id: 2331ffacefcbc6f8eca79694757d740de74a0f1d	2021-10-08 08:32:19 -07:00
Vasiliy Kuznetsov	81660c08f0	quantized add: enable broadcasting (#66049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66049 Enables quantized add with broadcasting. As pointed out by jamesr66a, this was disabled but TensorIterator already supports it. Added a test case to verify. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qadd_broadcast ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472850 fbshipit-source-id: a3b16d9000487918db743525d22db6864330762b	2021-10-08 08:31:07 -07:00
Edward Yang	ece0221854	Rename int to long, add more C++ types. (#66108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66108 BC-breaking change: intT is now longT (which aligns it more accurately with how the types are referred to in C++). The benefit for this is we can idiomatically express all C++ dtypes (with intT now mapping to int32_t). These types are needed for ufunc codegen in a latter patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31385761 Pulled By: ezyang fbshipit-source-id: ec6f3a0953794313470dbe14911f23ac116be425	2021-10-08 08:25:06 -07:00
Edward Yang	11bc435622	Allow registration of custom symbolics for prim namespace (#64460 ) (#66139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66139 [ONNX] Add prim::PythonOp check back in export.cpp (#64944) Add prim::PythonOp check back in export.cpp Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424102 fbshipit-source-id: 6d2eef767fab846ed79ea509e97b714072bac9f4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-08 07:41:06 -07:00
Edward Yang	9b09a5f7ba	[ONNX] Enable scripting tests (#64780 ) (#66138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66138 * Scripting tests * Fixed scripting tests for lower opsets Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424099 fbshipit-source-id: 67095b7ac67b9da986961788392aa92c95cf11f2	2021-10-08 07:41:03 -07:00
Edward Yang	53fefaa916	[ONNX] Fix duplicated output same name case (#64190 ) (#66137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66137 * fix duplicated output node same output name issue. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424100 fbshipit-source-id: b1b06a92c51744030788b651f3a597d987a8deda Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-10-08 07:41:01 -07:00
BowenBao	4af47eb3a7	[ONNX] Update slice process shape to support rank only inference (#65782 ) (#66149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149 Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31423232 Pulled By: ezyang fbshipit-source-id: 516e3916aa71afda2b10e44620636e42ed837236 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-08 07:39:40 -07:00
Richard Zou	dc37547c44	Opinfos for avg_pooling (#64214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64214 Added OpInfos for: - F.adapative_avg_pool{1, 3}d - F.avg_pool{1, 3}d The 2d variants already had OpInfos. Test Plan: - run tests Reviewed By: albanD, mruberry Differential Revision: D30667797 Pulled By: zou3519 fbshipit-source-id: 53f5cd02070de5b7db4abb017d727376b59288df	2021-10-08 07:26:08 -07:00
Jeeja KP	8d6d448238	Add HPU for Autograd Fallback (#65605 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65605 Reviewed By: albanD Differential Revision: D31373899 Pulled By: ezyang fbshipit-source-id: 894f62dc44b0532f152dc97b839eecfbaed25e8c	2021-10-08 07:21:44 -07:00
Ankita Sharma	4af913a7cf	fixed minor issues for index_add in docs (#65806 ) Summary: Hi, I'm looking forward to contributing to PyTorch, so starting with a minor fix in the documentation for `index_add`. Currently, in the documentation for `index_add_` (please see https://pytorch.org/docs/master/generated/torch.Tensor.index_add_.html#torch.Tensor.index_add_): 1. `tensor` attribute was pointing to `torch.tensor` class, which IMO - is (thought may not be a big deal) unintentional. 2. `dim` attribute is pointing to `torch.Tensor.dim`, which again IMO - is unintentional. This PR suggests a correction for the first point above, to rename `tensor` attribute to `input` so that it doesn't point to `torch.tensor` class. (I've verified that others ops like `scatter` use `input`, so this should not break the consistency in the documentation). I couldn't find an appropriate fix for the second point above, since renaming `dim` to something else will break the consistency (as almost all others op in PyTorch use `dim` as the attribute name). I may be wrong here, so please let me know if there is any feedback or an alternate fix for this. _Note:_ I plan to fix this behavior for `index_copy_` (https://pytorch.org/docs/master/generated/torch.Tensor.index_copy_.html#torch.Tensor.index_copy_) once and if this PR is approved. To the reviewers, please help me tag the correct person who could help review this PR. cc: krshrimali mruberry zou3519 cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/65806 Reviewed By: dagitses, mruberry Differential Revision: D31431182 Pulled By: zou3519 fbshipit-source-id: 66ced9677ac3bc71d672d13366f9f567ecea0a2d	2021-10-08 07:17:15 -07:00
Luca Wehrstedt	61f0bb70c1	Allow external CUDA streams to be set as current (#65914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65822. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914 Reviewed By: dagitses Differential Revision: D31389480 Pulled By: lw fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead	2021-10-08 06:09:32 -07:00
Shiyan Deng	60fe854f9f	[fx2trt] save and load TRTModule for OSS (#65958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65958 zhxchen17 added `pickle` pybind for trt engine which allows us to save and load a nn.Module with trt engine in fbcode. This diff though is explicitly ser/des engine in __set_state__` and `__get_state__` so that in OSS people can also save and load TRTModule directly. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fx2trt Reviewed By: wushirong Differential Revision: D31309429 fbshipit-source-id: 9068e2ae6375ed0e1bb55b0e9d582b8d9c049dbf	2021-10-07 22:27:40 -07:00
jiej	321345d7c9	Revert "Revert D31227448: [pytorch][PR] fixing sorting in stride indices" (#66176 ) Summary: enabling https://github.com/pytorch/pytorch/issues/63940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66176 Reviewed By: ngimel Differential Revision: D31423920 Pulled By: dzhulgakov fbshipit-source-id: 06b1e0f757f4fb5b31ee1fa464bcd689df919b9c	2021-10-07 22:09:07 -07:00
Shiyan Deng	74477ba243	[fx2trt] More controls over output dtypes (#65959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65959 Give some more controls over the output dtype of a trt engine. Previously it would be fp16 if we turn on fp16_mode. This diff allows the engine to generate fp32 output with fp16_mode=True. Test Plan: CI Reviewed By: kflu, wushirong Differential Revision: D31243929 fbshipit-source-id: 09c752e6f382d6ad169da66878d9a9277c134869	2021-10-07 22:03:51 -07:00
CodemodService FBSourceClangFormatLinterBot	227f91e72d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31495160 fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61	2021-10-07 21:09:22 -07:00
Ben Koopman	a58ff186e8	[quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443 Test Plan: Imported from OSS Reviewed By: dagitses, supriyar Differential Revision: D31456445 Pulled By: b-koopman fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de	2021-10-07 20:19:29 -07:00
Dhruv Matani	64caee1356	[PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131 Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support. ghstack-source-id: 140045478 (Note: this ignores all push blocking failures!) Test Plan: ``` buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor buck build //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: kimishpatel Differential Revision: D31387784 fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143	2021-10-07 20:01:18 -07:00
Nikita Shulga	ebe530a9cd	Periodic jobs should not have CIFLOW_DEFAULT label (#66300 ) Summary: Noticed that `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` job has a `ciflow/default`, but does not have a `ciflow/scheduled` label Added asserts to enforce that jobs with non-trival is_scheduled property do not have default and do have scheduled labesl Rename `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` to `periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66300 Reviewed By: seemethere Differential Revision: D31493323 Pulled By: malfet fbshipit-source-id: 194c1d7a4e659847d94a547b87a0d7d08e66406d	2021-10-07 19:57:32 -07:00
Peter Bell	bd9eee4e65	TBB: Use static partitioner to match OpenMP scheduling (#65327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65327 Should fix https://github.com/pytorch/pytorch/issues/64571 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31474116 Pulled By: malfet fbshipit-source-id: 8c4264d4778c6caf58261e3f70d72decd134128d	2021-10-07 19:12:36 -07:00
Nikita Shulga	d5033410b1	Parallel: Deduplicate parallel functions in different backends (#65326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65326 parallel_for and parallel_reduce currently share some common code in all backends, specifically for detecting if it should run in parallel or not. This moves all the backend-specific code into a single `internal::invoke_parallel` function and makes the `parallel_` functions common to all backends. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31124495 fbshipit-source-id: 65c3d2af42a8860cc4d6349566085c9fa8d8c6f0	2021-10-07 19:11:19 -07:00
Nikita Shulga	e1817d895f	[BE] Cleanup python_function.cpp (#66296 ) Summary: - Delete unused `var_input_idx` - Fix `uninitialized variable` clang-tidy warning by setting `PyObject* input` to PyNone Pull Request resolved: https://github.com/pytorch/pytorch/pull/66296 Reviewed By: janeyx99 Differential Revision: D31491016 Pulled By: malfet fbshipit-source-id: 08267144be0cd049d122580cdf81cf586c3e30a6	2021-10-07 18:41:17 -07:00
Eli Uriegas	ca363d1e22	docker: Ensure libgnutls30 for all docker builds (#66258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66258 Installing libgnutls30 has shown to be good when confronted with the CERT issue related to deb.nodesource.com Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31477789 Pulled By: seemethere fbshipit-source-id: f87ae4c098771acc505db14e3982d8858cf7326f	2021-10-07 18:36:40 -07:00
Rohan Varma	38f5144eae	Fix https://github.com/pytorch/pytorch/issues/61982 (#66015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015 Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of tensors in DDPSink. Only applies once for static_graph and generally for unused params which already has overhead, so perf hit should not be an issue. Will verify with benchmark. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D31346633 fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b	2021-10-07 18:11:18 -07:00
Peter Bell	20f2e55d4f	Rename cuda/Resize.cu to cuda/Resize.cpp (#65943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65943 These files don't require nvcc to compile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386277 Pulled By: ngimel fbshipit-source-id: 1066ee87fa795e2c7969447fbce1fe2633fb9680	2021-10-07 16:37:51 -07:00
Ashish Solanki	86de09e49a	Upgrade to ubuntu:trusty-20190515 (#63468 ) Summary: Security Upgrade to ubuntu:trusty-20190515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63468 Reviewed By: ngimel Differential Revision: D31393552 Pulled By: malfet fbshipit-source-id: 4e2399e3cddc1d549c08c82c08015e00569c19bc	2021-10-07 16:28:08 -07:00
Don Jang	416f593080	[Static Runtime] Group graph nodes into input aliases & output aliases (#65517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517 This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows: - input_aliases: values that are either inputs or contain aliases of inputs or constants. - output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases. - Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph. `output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance. Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before. Reviewed By: hlu1 Differential Revision: D30940955 fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc	2021-10-07 14:35:12 -07:00
Mikayla Gawarecki	0e2d1b221a	[Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QLxw Reviewed By: albanD Differential Revision: D31477212 fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6	2021-10-07 14:07:39 -07:00
Shunting Zhang	5e7d8ec846	Support Registering a Variable Length List of Builtin Modules for torch::deploy Builtin Libraries (#66021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66021 A builtin library consists of a list of frozen modules and a list of builtin modules. For tensorrt, it's quite simple since we only have a single builtin module tensorrt.tensorrt. But it can be complex for libraries like numpy which contains multiple builtin modules (np.core._multiarray_umath, np.random.mtrand etc.) if we want to add it as a torch::deploy builtin. We enhance the macro that registers builtin libraries to accept a variable length of builtin modules. We can use this macro to register frozentorch, frozenpython, tensorrt for now and can also use it to register libraries like numpy later on. The enhanced macro now looks as follows. Although we don't need to worry about back-compatibility for now, but this enhanced version is fully compatible with the previous version. The previous version is just a special case when the library contains no builtin modules. ``` REGISTER_TORCH_DEPLOY_BUILTIN(library_name_without_quote, frozen_modules_list, builtin_module_name_1, builtin_module_init_function_1, ..., builtin_module_name_N, builtin_module_init_function_N) ``` ghstack-source-id: 140007970 Test Plan: 1. Play around with interactive_embedded_interpreter.cpp to import torch._C, tensorrt.tensorrt etc inside the embedded interpreter. 2. Enhance test_builtin_registry.cpp 3. Run test_deploy.cpp and test_deploy_gpu.cpp Reviewed By: suo Differential Revision: D31349390 fbshipit-source-id: 70a1fcf660341180fc4d5195aed15ceb07c2bef7	2021-10-07 13:23:46 -07:00
Raghavan Raman	40dd2711b6	[Static Runtime] Cleanup LLVMCodeGen memory after code gen completes (#66218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66218 This stack of diffs reduces the memory used by LLVMCodeGen object. Here are the numbers on model `294738512`: (this is the number reported as `Memory turnover after freeze_module:` in the output) ``` Before: 123343496 After : 121566008 ``` So, there is a reduction of about `~1.77MB` with this change of making `PytorchLLVMJIT` a singleton. Test Plan: Imported from OSS Reviewed By: ZolotukhinM, hlu1 Differential Revision: D31445798 Pulled By: navahgar fbshipit-source-id: c860d36456b2c5d3e21010c1217e2948326f666d	2021-10-07 13:17:13 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Bin Wen	3cc40253d9	add gather to ShardedTensor (#65671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65671 Tentative implementation to use dist.gather_object to collect shards from all ranks and then "merge" them. The merge is done on dst_rank though padding the sharded tensors into the size of full tensor based on their metadata (offsets, lengths) first, and then summing these padded tensors together. Also considered concatenating sharded tensor without padding to minimize memory footprint (assuming padding will increase memory). But it may not be flexible enough for arbitrary sharing (e.g. shard on multiple directions) Another way can be constructing the padded tensor on each rank and reduce to rank0. I feel this is the most easy implementation. But it will invoke higher memory usage and comm payload. Please let me know if this alternative is preferred. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS python test/distributed/_sharded_tensor/test_sharded_tensor.py -v -k test_gather did not manage to test on oss, but tested in fbcode by reserving on demand gpu arc patch D31197611 modify the test with 2 gpus as on-demand gpu only has 2 cores (D31227986) buck test -c fbcode.enable_gpu_sections=true mode/dev-nosan caffe2/test/distributed/_sharded_tensor:sharded_tensor -- test_gather buck-out/gen/caffe2/test/distributed/_sharded_tensor/sharded_tensor#binary.par test_sharded_tensor.TestShardedTensorChunked.test_gather {F667213605} Reviewed By: dagitses, pritamdamania87 Differential Revision: D31197611 Pulled By: dracifer fbshipit-source-id: cf98b4a2d7838b11b9582eb23f826bb0fa38a7f4	2021-10-07 13:01:12 -07:00
Peter Bell	f445ed19b2	OpInfo for 2d fft functions (#66128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66128 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450217 Pulled By: mruberry fbshipit-source-id: 1952fc60c5d5f454966c43f5710b8b97a9794d0e	2021-10-07 12:50:06 -07:00
Peter Bell	2213c463ba	C++ API and docs for hfftn (#66127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66127 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450216 Pulled By: mruberry fbshipit-source-id: 2878aee294aa7d74482b66d536258bac0541408d	2021-10-07 12:48:36 -07:00
Peter Bell	e6a4f746c2	slow_conv3d: Use at::sum for grad_bias accumulation (#65758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65758 The same change has been made in conv2d, the proper algorithm is both faster and gives more precision. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257872 Pulled By: ngimel fbshipit-source-id: 6ff3a7a00a05b66f83d45cc820bd0c230cb8de6d	2021-10-07 12:20:49 -07:00
Ivan Yashchuk	2e4e5b0264	Add inplace_variant for resize_ OpInfo (#66135 ) Summary: Enable testing of `torch.Tensor.resize_`. The negative view test is skipped as the test doesn't work with resize_ see https://github.com/pytorch/pytorch/issues/65945. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66135 Reviewed By: dagitses Differential Revision: D31444263 Pulled By: mruberry fbshipit-source-id: 00c7fe05df28fba01508b31adb3ed4fdcf4d0326	2021-10-07 12:00:30 -07:00
Samuel Salas	361b34eb81	Chunk: acc_ops (#66010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66010 Added chunk acc op and unit test. Removed misleading return statements. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31326490 fbshipit-source-id: 81183ad8773eb7471566bec07cdd3dd6c4cee217	2021-10-07 11:41:00 -07:00
Patrick Spencer	9fb6ba24e7	Update `torch.fx.passes.split_module` docstring (#65542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542 Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions. Changed original example to the example from this diff: https://www.internalfb.com/diff/D24925283 (`9734c042b8`) Test Plan: Ran buck test //caffe2/test:fx. No errors detected https://pxl.cl/1QCch Reviewed By: jamesr66a Differential Revision: D31145694 fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e	2021-10-07 10:37:10 -07:00
Mike Iovine	d5f64afc38	[Static Runtime] Support aten::to.prim_dtype overload (#64928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64928 Added support this overload of `aten::to`: ``` aten::to.prim_dtype(Tensor(a) self, int? dtype, bool non_blocking=False, bool copy=False) -> Tensor(a\|b) ``` Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_to` Reviewed By: hlu1 Differential Revision: D30901398 fbshipit-source-id: 38ce807c30185e92dd472b404b362f22ac7e4efb	2021-10-07 10:22:44 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Yanli Zhao	61fca037d6	[Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881 This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline. Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D30521673 fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500	2021-10-07 09:06:44 -07:00
Erjia Guan	88f8944ef1	Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS Test Plan: revert-hammer Differential Revision: D30599136 (`eeaf527feb`) Original commit changeset: 102f23fb652c fbshipit-source-id: 8ac3d75a52d06a5c4196bae2db1c4df2d5c5c666	2021-10-07 08:34:23 -07:00
Richard Barnes	2f1ab477f1	Speed up DataTypeToTypeMeta (#66113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66113 For a benchmark compiled in opt-mode in which the lookup items were shuffled and then the items were looked up round-robin fashion 10M times (for a total of 140M lookups) compiled in opt-mode we see: ``` Function Container Time (ms) Multiplier TypeMetaToDataType if-chain 233 1x TypeMetaToDataType std::vector 795 3.41x TypeMetaToDataType std::map 1566 6.72x TypeMetaToDataType std::unordered_map 2136 9.17x DataTypeToTypeMeta switch 102 1x DataTypeToTypeMeta std::vector 666 6.53x DataTypeToTypeMeta std::map 1212 11.9x DataTypeToTypeMeta std::unordered_map 1539 15.1x DataTypeToTypeMeta folly::F14FastMap 1789 17.5x ``` From this, we draw two conclusions: 1. Using a complex container like `std::map` is worse than using a simple vector lookup here (there aren't enough items for the Big-O to assert itself). 2. Using any container at all is a mistake. (Unless we pull in more exotic reasoning like invalidating the code cache or preventing inlining.) Test Plan: Sandcastle Reviewed By: dzhulgakov Differential Revision: D31375117 fbshipit-source-id: 0b310c6c2e94080d125c82fb7c2b43ab869adbcb	2021-10-07 08:06:09 -07:00
Mikayla Gawarecki	1e4bcbdddb	[Bootcamp][Pytorch Core] Add test for complex numbers for vanilla SGD (#66230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66230 Adding test to ensure Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github https://github.com/pytorch/pytorch/issues/65711 ghstack-source-id: 139918862 Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QHvX Reviewed By: albanD Differential Revision: D31449289 fbshipit-source-id: da8b00421085796a23b643e73f96b19b5b560a32	2021-10-07 07:14:05 -07:00
Mike Iovine	057a01556c	[Static Runtime] Do not use variadic_sigrid_transforms_torch_bind if out variant is disabled (#66221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66221 JIT doesn't have an implementation for this op, so we can only use it when out variants are enabled. Reviewed By: hlu1 Differential Revision: D31445887 fbshipit-source-id: 4565ac4df751d8ee4052647574c43efa05ea1452	2021-10-07 06:57:17 -07:00
CodemodService FBSourceClangFormatLinterBot	dcf39f9bb9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31464823 fbshipit-source-id: 37bd72c8f1c8240d2ae72385a0707003ddb24ce8	2021-10-07 04:17:48 -07:00
Kiuk Chung	df11e2d6f9	(torch/elastic) add fqdn hostname to error printout (#66182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182 closes https://github.com/pytorch/pytorch/issues/63174 Does a few things: 1. adds hostname to the error report 2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end) 3. moves redundant error info logging to debug 4. makes the border max 60 char in length and justifies left for the header NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation). Test Plan: Sample ``` ============================================================ run_script_path FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2021-10-05_17:37:22 host : devvm4955.prn0.facebook.com rank : 0 (local_rank: 0) exitcode : 1 (pid: 3296201) error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper return f(args, *kwargs) File "main.py", line 28, in main raise RuntimeError(args.throws) RuntimeError: foobar ============================================================ ``` Reviewed By: cbalioglu, aivanou Differential Revision: D31416492 fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9	2021-10-07 01:40:02 -07:00
Supriya Rao	8a974a482c	[quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674 Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules. With this PR they can use either the static or dynamic quantization APIs for Embedding quantization The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float method of the quantized Embedding/Embedding modules. To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type. The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32. Addresses Issue #65185 ghstack-source-id: 139935419 Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: gchanan Differential Revision: D31211199 fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4	2021-10-06 23:19:38 -07:00
Samuel Salas	115526cc88	GELU Converter (#66008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66008 Added GELU converter and updated TARGET file of deeplearning/trt/fx2trt to load the plugins onto the converters Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu Reviewed By: 842974287 Differential Revision: D31284144 fbshipit-source-id: 0e938a47a99d289aefc3308aec3937c7334e9b8a	2021-10-06 22:25:43 -07:00
Martin Yuan	ac0dbd6eec	Promote missing ops for delegated models (#66052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66052 `aten::__getitem__.Dict_str` and `prim::unchecked_cast` are used in delegate API. ghstack-source-id: 139860350 Test Plan: CI Reviewed By: pavithranrao Differential Revision: D31364720 fbshipit-source-id: dfca5e3ded4cdd3329c9b9d80a13f0fb1f5f2a51	2021-10-06 21:48:42 -07:00
Peter Bell	3f30526ff2	Remove THCAllocator (#65942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65942 This one is a bit weird. The class is called `THCIpcDeleter` but it actually has nothing IPC-specific. It just converts `std::shared_ptr` + `void*` into a `c10::DataPtr`. Instead, moving the `DataPtr` conversion into the actual IPC code allows 2 memory allocations to be elided by merging 3 separate deletion contexts into one. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31386278 Pulled By: ngimel fbshipit-source-id: 5722beed9dcf680f0eb6bbff30405cff47b21962	2021-10-06 19:04:43 -07:00
Chen Lai	eeaf527feb	[Pytorch Edge][tracing-based] build tracer in OSS (#64087 ) Summary: 1. Introduce ``` MobileModelRunner.h MobileModelRunner.cpp TensorUtils.h TensorUtils.cpp ``` in external. They are pretty much the same as internal, except namespace and the dependency in folly. In next prs, TensorUtils and MobileModelRunner are unified between external and internal. 2. Introduce ``` tracer.cpp ``` for external. Majority is the same as internal one, with some cleanup on unnecessary dependency. It's unified between internal and external in next change. 3. Add an executable to build the tracer. It will be built for desktop only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64087 ghstack-source-id: 139900300 Test Plan: Given the model ``` class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.lin = nn.Linear(10, 1) def forward(self, x): return self.lin(x) model = Net() scripted_module = torch.jit.script(model) example_dict = {'a' : 1, 'b' : 2} sample_input = { scripted_module.forward : [(torch.zeros(1,10),)], } bundled_model = torch.utils.bundled_inputs.bundle_inputs(scripted_module, sample_input) bundled_model._save_for_lite_interpreter("dummy_model_with_bundled_input.ptl") ``` External tracer ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/dummy_model_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` and compare `tmp.yaml` with the operator list generated from Internal tracer ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/dummy_model_with_bundled_input.ptl ``` QNNPACK only: Example yaml from internal tracer: P460742166 [devserver] Example yaml from external tracer: P460759099 [mac], P460742166 [devserver] Comparison ops between internal and external on devserver: {F666923807} {F666924048} Note: The operators generated on mac and devservers are different, the one on deserver includes two extra ops: `aten::addmm_, aten::slow_conv_dilated2d"`. Based on the traced list, when calling `aten::_convolution`, one calls `aten::mkldnn_convolution`, and the other calls `aten::_convolution_nogroup`, causing the divergence. Thanks for Martin for pointing out: > mkldnn is another backend from Intel Reviewed By: dhruvbird Differential Revision: D30599136 fbshipit-source-id: 102f23fb652c728a9ee4379f9acc43ae300d8e8a	2021-10-06 19:01:04 -07:00
Chen Lai	0cab25468d	[Pytorch Edge][tracing-based] reorganize model tracer dependency (#63421 ) Summary: 1. move 4 files to : ``` KernelDTypeTracer.h KernelDTypeTracer.h OperatorCallTracer.h OperatorCallTracer.h ``` so it's visible in OSS. 2. Update the namespace to `torch::jit::mobile` 3. Add a `fb_xplat_cxx_library` `torch_model_tracer` with the source file list above. 4. update the `fb_xplat_cxx_library` `model_tracer_lib` dependency on the new `torch_model_tracer` library Pull Request resolved: https://github.com/pytorch/pytorch/pull/63421 ghstack-source-id: 139900299 Reviewed By: dhruvbird Differential Revision: D30378069 fbshipit-source-id: d56c6140e951bc13113a76d6b63767a93843c842	2021-10-06 18:59:50 -07:00
Horace He	300613dc60	make FX symbolic tracing reuse buffers if they're the same (#66211 ) Summary: Currently, if the same tensor constant is reused multiple times, we'll store a tensor constant for each time we use it. For example ``` val = torch.randn(5) for _ in range(10): x = x + val ``` ends up storing 10 tensor constants. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66211 Reviewed By: jamesr66a Differential Revision: D31437089 Pulled By: Chillee fbshipit-source-id: 401169c8d58ce0afb7025ae11060680ef544419f	2021-10-06 18:35:38 -07:00
Priya Ramani	67970e8c9b	Add CI tests for AOT Compile (#65441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65441 Adding CI test to verify a simple linear model can compile fine. Successful run from CI logs: ``` + test_aot_model_compiler + echo 'Testing AOT model compiler' Testing AOT model compiler + source test/mobile/nnc/test_aot_compile.sh +++ python -c 'import site; print(site.getsitepackages()[0])' ++ TORCH_INSTALL_DIR=/opt/conda/lib/python3.6/site-packages/torch ++ TORCH_BIN_DIR=/opt/conda/lib/python3.6/site-packages/torch/bin +++ dirname test/mobile/nnc/test_aot_compile.sh ++ CURRENT_DIR=test/mobile/nnc ++ MODEL=aot_test_model.pt ++ COMPILED_MODEL=aot_test_model.compiled.pt ++ COMPILED_CODE=aot_test_model.compiled.ll ++ test_aot_model_compiler ++ python test/mobile/nnc/aot_test_model.py ++ exit_code=0 ++ [[ 0 != 0 ]] ++ /opt/conda/lib/python3.6/site-packages/torch/bin/test_aot_model_compiler --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims=2,2,2 The compiled model was saved to aot_test_model.compiled.pt ++ success=1 ++ '[' '!' -f aot_test_model.compiled.pt ']' ++ '[' '!' -f aot_test_model.compiled.ll ']' ++ '[' -f aot_test_model.compiled.ll ']' ++ rm aot_test_model.compiled.ll ++ '[' -f aot_test_model.compiled.pt ']' ++ rm aot_test_model.compiled.pt ++ rm aot_test_model.pt ++ '[' 1 = 0 ']' + [[ linux-xenial-py3.6-gcc5.4-default == pytorch-linux-xenial-py3* ]] + assert_git_not_dirty + [[ linux-xenial-py3.6-gcc5.4-default != rocm ]] + [[ linux-xenial-py3.6-gcc5.4-default != xla ]] ++ git status --porcelain + git_status= + [[ -n '' ]] + test_custom_script_ops ``` Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D31348169 Pulled By: priyaramani fbshipit-source-id: dd5c55859dfa07d150e5decc2dd7e56f43e7f66b	2021-10-06 18:23:19 -07:00
Shunting Zhang	6c54971cd9	Open Registration for torch::deploy Builtins (#65953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65953 Previously if people want to add a torch::deploy builtin, they need to change torch::deploy internal code (interpreter_impl.cpp) to register the python part as frozen modules and C++ part as builtin modules. This is not convenient and error prone. We want to add open registration support for torch::deploy builtins so that people only need to add one effective line of code in there library code to complete the registration. Here is an example to registry numpy as torch::deploy builtins: REGISTER_TORCH_DEPLOY_BUILTIN(numpy, numpy_frozen_modules, <list of name, PyInit function pairs>) This diff supports open registration of frozen modules. It's the first step to achieve the plan above. ghstack-source-id: 139888306 Test Plan: Run tests in test_deploy.cpp and test_builtin_registry.cpp Reviewed By: suo Differential Revision: D31321562 fbshipit-source-id: 6445bd8869f1bb7126b4c96cf06c31145f0e9445	2021-10-06 18:04:57 -07:00
Michael Suo	213c3f45da	[oss/ci] skip TestDataLoaderPersistentWorkers on ASAN (#66236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66236 it's flaky, see https://github.com/pytorch/pytorch/issues/66223 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31462056 Pulled By: suo fbshipit-source-id: f4362a8020dc05ac8856706c0508d48be026eeb8	2021-10-06 17:56:19 -07:00
Aliaksandr Ivanou	4937218611	[torch][launch] Add ability to override sys.executable for `torch.distributed.run` (#66179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66179 The diff adds check for `PYTHON_EXEC` environment variable. If the variable is set, it will override `sys.executable` for `torch.distibuted.run`. This means that if `PYTHON_EXEC` is set, user scripts executed via `torch.distributed.run` will start via value of `os.environ["PYTHON_EXEC"]` Test Plan: unittest Reviewed By: kiukchung Differential Revision: D31329003 fbshipit-source-id: b9d0167d99bbf463a6390f508324883ca4a1e439	2021-10-06 17:33:19 -07:00
Sangbaek Park	e8837d741e	[Vulkan] cat operator for height dimension (#66103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66103 Implemented `cat` operator for height dimension Test Plan: On Mac ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (272 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (161 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (235 ms) ``` On Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (98 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (105 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (101 ms) ``` Reviewed By: SS-JIA Differential Revision: D31323141 fbshipit-source-id: 68b187e856758790cc5f7b0c263feb30a2bb467f	2021-10-06 16:12:59 -07:00
Nikita Vedeneev	1d586e78c6	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: dagitses Differential Revision: D31431847 Pulled By: albanD fbshipit-source-id: 0e343e0d9da3c3d2051fca215fad289d77275251	2021-10-06 16:04:22 -07:00
Priya Ramani	78209b93b3	Don't build shared library for AOT Compiler (#66227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66227 Building a shared library for AOT Compiler is not necessary as it's included in libtorch. Also having this built as a shared library was affecting android builds and we don't need to build AOT Compiler for mobile builds Before fix: ``` (pytorch) ~/local/pytorch master └─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21] Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu) Python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] Caffe2 path: /data/users/priyaramani/pytorch Using Android NDK at /opt/android_ndk/r20/ . . FAILED: lib/libaot_compiler.so : && /opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/and roid_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -fno-addrsig -march=armv7-a -mt humb -Wa,--noexecstack -Wformat -Werror=format-security -frtti -fexceptions -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK - DBUILD_LITE_INTERPRETER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bound s -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -W no-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-miss ing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -g0 -Oz -DNDEBUG -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libatomic.a -static-libstdc++ -Wl,--build-id -Wl,--warn-shared-text rel -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -Wl,-z,noexecstack -rdynamic -shared -Wl,-soname,libaot_compiler.so -o lib/libaot_compiler.so caffe2/torch/CMakeFi les/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o -latomic -lm && : caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function at::from_blob(void*, c10::ArrayRef<long long>, c10::TensorOptions const&): error: undefined reference t o 'at::TensorMaker::make_tensor()' . . caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function torch::jit::mobile::nnc::Function::Function(): error: undefined reference to 'c10::AnyType::get()' clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` After fix: ``` (pytorch) ~/local/pytorch master └─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21] Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu) Python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] Caffe2 path: /data/users/priyaramani/pytorch Using Android NDK at /opt/android_ndk/r20/ . . -- Build files have been written to: /data/users/priyaramani/pytorch/build_android Will install headers and libs to /data/users/priyaramani/pytorch/build_android/install for further Android project usage. [2/3] Install the project... -- Install configuration: "Release" Installation completed, now you can copy the headers/libs from /data/users/priyaramani/pytorch/build_android/install to your Android project directory. ``` Test Plan: Imported from OSS Reviewed By: ljk53, axitkhurana Differential Revision: D31450970 Pulled By: priyaramani fbshipit-source-id: 87e48033f1db46fef112bae1239a09a2365620d2	2021-10-06 15:57:32 -07:00
Natalia Gimelshein	4a50b6c490	fix cosine similarity dimensionality check (#66191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66191 Reviewed By: dagitses, malfet Differential Revision: D31436997 Pulled By: ngimel fbshipit-source-id: 363556eea4e1696d928ae08320d298451c286b10	2021-10-06 15:44:51 -07:00
Scott Wolchok	05e1476d49	[jit] Fix list copy in MemoryDAG (#65176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65176 getElements returns a reference. ghstack-source-id: 139745230 Test Plan: CI Static runtime startup for ctr_mobile_feed local net reduced from 8.35s to 7.8s Reviewed By: malfet Differential Revision: D30983898 fbshipit-source-id: 884bff40f12322633c0fffd45aed5b8bc7498352	2021-10-06 15:39:33 -07:00
Martin Yuan	fc4836f400	[Fix] Use full name to look for the promoted prim operator table (#66081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66081 Two fixes: 1. Since the operators are always registered with both name and overload name, the overloaded name need to be included when looking for an operator. 2. Don't promote operators with alias, because the new registry does not support schema with alias. ghstack-source-id: 139732099 Test Plan: CI Reviewed By: pavithranrao Differential Revision: D31382262 fbshipit-source-id: 43c6e6e0c13950a9ce8cf3a70debe0421372d053	2021-10-06 15:35:02 -07:00
Peter Bell	7cc121dbcd	slow_conv3d grad_input: Avoid dispatch in parallel region (#65757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65757 See gh-56794 Avoid dispatch inside of parallel_for by: - Replacing Tensor slicing with TensorAccessor - Replaces `bmm` and `mm` with direct calls to gemm. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257878 Pulled By: ngimel fbshipit-source-id: e6aad2d5ae7fa432bd27af2b1a8b0dcef1fc6653	2021-10-06 15:08:47 -07:00
Rohan Varma	480a1a88d6	[DDP] Log iteration in debug mode (#65770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65770 This logging info is printed out in debug mode, make it log the iteration as well for clarity. ghstack-source-id: 139838595 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31222132 fbshipit-source-id: 14519aae1ba0b2a35b4b962e7d1a957c9142c8f8	2021-10-06 14:36:07 -07:00
Rohan Varma	722f1ccfb8	[DDP][Instrumentation] Profiling range for bucket copy (#65769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65769 Seeing some bottlenecks when copying bucket to grad, help make it more clear here. ghstack-source-id: 139838597 Test Plan: Ci Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31217340 fbshipit-source-id: 762a254a3538eb5292b3a53bb5d1211057ecbdbb	2021-10-06 14:34:10 -07:00
Eli Uriegas	84c5970a77	ci: Migrate slow_gradcheck to GHA (#65730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65730 This should close out the door on migrating all scheduled workflows we have for CircleCI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31225188 Pulled By: seemethere fbshipit-source-id: 4c49e88ec017edc30e07325dbc613ff54dd164d8	2021-10-06 14:29:14 -07:00
Shijun Kong	e2be087207	[oss][pytorch] Add quint2x4 dtype (#65545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65545 Introduce 2bit qtensor. The new dtype added for this is c10::quint2x4 The underlying storage for this is still uint8_t, so we pack 4 2-bit values in a byte while quantizing it. Kernels that use this dtype should be aware of the packing format. (4 2-bit values in one byte) Test Plan: `buck test mode/dev-asan caffe2/test/:quantization -- test_qtensor` Reviewed By: supriyar Differential Revision: D31148141 fbshipit-source-id: 1dc1de719e097adaf93fee47c6d1b8010a3eae6c	2021-10-06 14:22:00 -07:00
Scott Wolchok	252b6f2cba	[PyTorch][easy] Remove dead std::set in parseAliasAnnotation (#65712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65712 No reason for this to be here. ghstack-source-id: 139743362 Test Plan: fitsships Reviewed By: dhruvbird Differential Revision: D31215696 fbshipit-source-id: 238ea6633629831e54847ce82de23571cf476740	2021-10-06 14:20:31 -07:00
Baichuan Yuan	90db214d4b	support counter-based fused rowwise adagrad (#66177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66177 As title, with additional change to enable counter for SparseAdagrad. Test Plan: buck test //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test Testing with canary packages baseline: f297789852 counter run: f297789912 Reviewed By: jspark1105 Differential Revision: D30903029 fbshipit-source-id: 3ed89a7da409fd820fd0b44950407c20fa2018a5	2021-10-06 13:50:43 -07:00
Mike Iovine	6d7fab5929	[Static Runtime][easy] Clone scripts do not use aten::add (#66161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66161 `aten::add` is not guaranteed to be bit exact with the JIT interpreter. This was causing non-deterministic test failures on master. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31406764 fbshipit-source-id: d968cb1bdb8f33934682ef3712a1341a3aacf18e	2021-10-06 12:37:39 -07:00
Ben Koopman	9285981de1	Clean up unused model instantiation (#65487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65487 Test Plan: Imported from OSS Reviewed By: jingsh Differential Revision: D31410880 Pulled By: b-koopman fbshipit-source-id: 09b2d2d899a232e7334c82f00eff0f900e817853	2021-10-06 12:21:56 -07:00
Samuel Salas	8548928950	Cumsum: acc_ops (#66189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66189 Added acc_ops for cumsum and unit test Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31355244 fbshipit-source-id: 41490d300553b0a5d52cbc4e681bdd0cf990eb42	2021-10-06 12:15:36 -07:00
Peter Bell	623ac7eabb	slow_conv3d: Avoid dispatch in parallel region (#65737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65737 See gh-56794 Avoid dispatch inside of parallel_for by: - Replacing Tensor slicing with TensorAccessor - Copy bias into output only once, outside of the parallel region - Replaces `addmm_` and `baddbmm_` with direct calls to gemm. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257874 Pulled By: ngimel fbshipit-source-id: 20b94daa13082fb1e39eaa8144bfa4c611b61bab	2021-10-06 12:10:55 -07:00
Zafar Takhirov	9a0b2acd76	[quant] Remove hypothesis from qtopk (#66158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66158 qtopk used hypothesis which created flaky tests. In addition to that the tests generated were not representative, and would not catch the cases that we are interested in. This diff removes the hypothesis from the qtopk and merges the qtopk and qtopk_nhwc tests. We now use specific testcases. ghstack-source-id: 139768865 Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_qtopk` Reviewed By: jerryzh168 Differential Revision: D31401341 fbshipit-source-id: a8fb37a7221fc43c159f34e28aa4a91ed3506944	2021-10-06 11:42:34 -07:00
Nikita Shulga	6d4d636d66	[GHA] Rectify `trigger_action_only` flag (#66209 ) Summary: No longer needed, as PR can be opened/reopened with specific label Fixes https://github.com/pytorch/pytorch/issues/66110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66209 Reviewed By: seemethere Differential Revision: D31436292 Pulled By: malfet fbshipit-source-id: 5b6e0875bec261862017dfe0eb3a5ec57fb8c705	2021-10-06 10:46:10 -07:00
Oskar Wirga	c4ea447eb5	Use src size for memcpy in order to avoid fortify complaints (#65222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65222 When compiling against the Android SDK with `--D_FORTIFY_SOURCE=2`, the compiler will complain that the `dst` size is a larger size than the `src` size due to the function templating using two differently sized objects. There is a `TORCH_CHECK` to ensure we don't go through with these `memcpy`'s, but in the interest of making the compiler happy, lets switch the `memcpy` to take `sizeof(src)`. Test Plan: CI Reviewed By: bertmaher, lanza Differential Revision: D30992678 fbshipit-source-id: b3e7aa992a3650e1051abad05be800b684e6332b	2021-10-06 09:05:31 -07:00
Nikita Shulga	bfaaac6392	Ignore register_rds errors (#66185 ) Summary: Network communications are flaky by nature, test should be marked as skipped if network ops can not be completed for some reason Fixes https://github.com/pytorch/pytorch/issues/66184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66185 Reviewed By: seemethere Differential Revision: D31423193 Pulled By: malfet fbshipit-source-id: 96c3a123c65913f44ea78b30a03e8e7eda164afe	2021-10-06 08:42:35 -07:00
Alexandr Guzhva	b8e1999253	[quant] Add op benchmark for GPU FakeQuantizePerChannel with float zero_points (#66183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66183 Add a GPU benchmark for fakeQuant, similar to #65241 ghstack-source-id: 139810414 Test Plan: https://pxl.cl/1QjJM Reviewed By: b-koopman Differential Revision: D31288158 fbshipit-source-id: 65526248b5c7b70f0bc32a86b08f50b4cbc7a83d	2021-10-06 08:07:42 -07:00
John Shen	9de9733390	Add 1d to 2d conv transform during mobile optimization (#65850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65850 This step was never added ghstack-source-id: 139753673 Test Plan: Run optimize_for_mobile on model with conv1d and see that it transforms to conv2d Reviewed By: kimishpatel Differential Revision: D31093503 fbshipit-source-id: 11a19f073789c01a9de80f33abbe628005996b66	2021-10-06 07:27:09 -07:00
Peter Bell	747a5782e3	[quant][fx] Don't assume bias is a keyword argument (#61647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61647 `prepare_fx` currently assumes that bias is always a positional argument to convolutions, and only a keyword argument to other functions. This happens to work today due to a quirk in how `__torch_function__` is handled for python functions but shouldn't be considered stable. Instead, we should support `bias` for both positional and keyword forms. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31401360 Pulled By: albanD fbshipit-source-id: 1e2f53d80e2176b870f326dc498e251e2386136e	2021-10-06 07:25:47 -07:00
Mengwei Liu	ab25516054	[PyTorch] Remove unused function in import (#65865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65865 `operator_str` is not used in `import.cpp` and it is also defined in `parse_operators.cpp` so removing it from `import.cpp`. Test Plan: CI passing Reviewed By: iseeyuan Differential Revision: D31293008 fbshipit-source-id: 1c857cbd63c57b8f79c1a068789fc8605605b642	2021-10-06 06:34:51 -07:00
Chen Lai	a5895f85be	[PyTorch Edge][type] Add type check in compatibility api (#63129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129 1. Add an api to get `supported_types` from runtime, expose in c++ only. 2. Add an api to get `contained_types` from model, expose in both c++ and PyThon. 3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string. 4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime. 5. Expand the unittest for compatibility to cover type 6. Add unit test in python to check type list ghstack-source-id: 139826944 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' buck test //caffe2/test:mobile ``` Reviewed By: iseeyuan Differential Revision: D30231419 fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947	2021-10-06 02:23:44 -07:00
Chen Lai	c75210face	[PyTorch Edge][type] Move TypeParser class definition to header file (#65976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65976 More TypeParser class to header file so it can be called from somewhere else. For example, the getContainedTypes() api in this stack can be moved to other files. ghstack-source-id: 139826943 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D31294254 fbshipit-source-id: 1c532fd69c7f6b44ad2332055d24c95a0fac1846	2021-10-06 02:22:26 -07:00
Bert Maher	931352c68d	Make handle_torch_function_no_python_arg_parser public (#66054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66054 I need this function in functorch to support the ability of custom jitted kernels to invoke torch_function when applicable. Test Plan: functorch unit tests Reviewed By: qihqi, ngimel Differential Revision: D31416599 Pulled By: bertmaher fbshipit-source-id: 90b57badd6a6b9d505ebfc436869b962b55c66d7	2021-10-06 00:27:10 -07:00
Stephen Jia	c0b1965f7c	Back out "[vulkan] Use push constants instead of SSBOs" (#66169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66169 Original change: D30368834 (`57e5ae5306`) Switching to Push Constants from Uniform Buffers caused some unforseen memory errors when running Mac unit tests. We'll switch back for now until we can pinpoint and resolve the issue. Test Plan: Build and run `vulkan_api_test` ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: beback4u Differential Revision: D31409130 fbshipit-source-id: cab1a3330945b50522235db6738406b6037f9c68	2021-10-05 21:28:59 -07:00
Thiago Crepaldi	8d435877d5	Fix typos at ONNX docs (#66090 ) Summary: This PR fixes small typos at ONNX docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/66090 Reviewed By: albanD Differential Revision: D31385765 Pulled By: ezyang fbshipit-source-id: f4879069a2acf9c8adaa81c26a6a5014634761f5	2021-10-05 21:11:47 -07:00
CodemodService Bot	cbc29acca3	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31423202 fbshipit-source-id: 08d249e8546c0bfe6f1145c0571141b90aad03eb	2021-10-05 20:55:56 -07:00
Gary Miguel	d1058df885	fix clang-tidy error introduced by #64382 (#65977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977 Reviewed By: ngimel Differential Revision: D31423174 Pulled By: malfet fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352	2021-10-05 20:13:13 -07:00
John Clow	6cdea8239e	Precomputing Transposes for frozen linear layers (#65631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31314248 Pulled By: Gamrix fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0	2021-10-05 20:08:32 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	43e26d0086	[deploy] Improve error messaging for create_movable (#65955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65955 This diff makes sure to give clear error message when user tries to create obj from obj that lives in different session Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy Reviewed By: suo Differential Revision: D31323045 fbshipit-source-id: e7bd6f76afeb0285847bc11881185a164f80e3f0	2021-10-05 19:49:51 -07:00
Rohan Varma	3bd26792c0	Skip test_multiple_groups on windows (#66154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66154 Skips as the test is flaky: https://github.com/pytorch/pytorch/issues/66059 ghstack-source-id: 139763149 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31403153 fbshipit-source-id: 7f47f17cee148a708346d6d9454c44a194d13a78	2021-10-05 18:33:23 -07:00
Rohan Varma	eeabab03e7	[DataParallel] Log API Usage for tracking (#66038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66038 Will help track workflows for DP deprecation. Tested via standalone DP script. Test Plan: CI Reviewed By: mrshenli Differential Revision: D31356975 fbshipit-source-id: c0a3ac3a1faed794e3362f3f3a19a6fb800587a7	2021-10-05 18:30:23 -07:00
Michael Benayoun	dc26f5eb65	[FX] Specifies a default value when possible for placeholders created from concrete_args (#59569 ) Summary: ```python class Foo(torch.nn.Module): def __init__(self): super().__init__() def forward(self, a=None, b=None): res = a if b is not None: res = res + b return res concrete_args = {'b': torch.tensor(5)} traced = fx.symbolic_trace(Foo(), concrete_args=concrete_args) ``` Gives the following error: ``` File "<eval_with_key_9>", line 2 def forward(self, a = None, b_1): ^ SyntaxError: non-default argument follows default argument ``` Since https://github.com/pytorch/pytorch/issues/55888, placeholders are also created for concrete arguments. But these placeholders do not have default values even when it was provided for the argument in question, causing the error above. To solve this, I add a default value when it is available during placeholder creation for concrete arguments. I also tried to set the default value to the value specified in concrete_args (since it many cases it will actually use this value anyway), but ran into an error because the default value is never defined: ``` def forward(self, a = None, b_1 = _tensor_constant0): _tensor_constant0 = self._tensor_constant0 _tensor_constant1 = self._tensor_constant1 add = a + _tensor_constant1; a = _tensor_constant1 = None NameError: name '_tensor_constant0' is not defined ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59569 Reviewed By: albanD Differential Revision: D31385607 Pulled By: Chillee fbshipit-source-id: 44a8ce28b5eabdb9b4c773e73a68ff0bb9c464cc	2021-10-05 17:45:09 -07:00
Alexandr Guzhva	83bac89d64	[quant] Add fp32/fp16 zero_point support for GPU fakeQuant (#65836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65836 Add a GPU implementation for GPU fakeQuant, similar to D30975238 (`60915eb810`) ghstack-source-id: 139779416 Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/281475183488511/ {F667112564} Reviewed By: b-koopman Differential Revision: D31091679 fbshipit-source-id: 68fd483e6926c7fd565703c01d8ffb337b75dca5	2021-10-05 17:40:54 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Eli Uriegas	5e6347ca64	.circleci: Remove migrated distributed configs (#66174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66174 These configs have already been migrated so going to go ahead and remove them Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31413579 Pulled By: seemethere fbshipit-source-id: 8923736d347eb8c8470884be413122c198d1bf20	2021-10-05 16:53:02 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Raghavan Raman	143c957c2d	[nnc] Reduced memory usage of LLVMCodeGen object after code generation is complete (#65373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65373 Test Plan: Imported from OSS Reviewed By: bertmaher, hlu1 Differential Revision: D31066974 Pulled By: navahgar fbshipit-source-id: 0dbe0d1746c50adee90fe5a7cc4a66adba3a229e	2021-10-05 16:27:43 -07:00
Jane Xu	68555339d7	test_utils.py: Add another retry to test_download_url_to_file (#66159 ) Summary: Fixes one of the flakiness concerns mentioned https://github.com/pytorch/pytorch/issues/65439#issuecomment-934686485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66159 Reviewed By: ngimel Differential Revision: D31406485 Pulled By: janeyx99 fbshipit-source-id: cf7834cdab58360ecef1748075d52969de2e0778	2021-10-05 16:26:20 -07:00
Eli Uriegas	d2021e5e68	ci: Migrate vulkan builds to GHA (#66044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66044 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31370889 Pulled By: seemethere fbshipit-source-id: 399f5f0c184f7856dcddb138c357f1374706e676	2021-10-05 16:11:36 -07:00
Nikita Shulga	7452b65144	Remove unused `dump` method from VSX vec256 methods (#66085 ) Summary: Follow up after https://github.com/pytorch/pytorch/pull/63533 Probably fixes https://github.com/pytorch/pytorch/issues/65956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66085 Reviewed By: ngimel Differential Revision: D31382898 Pulled By: malfet fbshipit-source-id: f3d97b0f2c7f1207827773ae85e2739f1d54b9c7	2021-10-05 16:05:01 -07:00
Bin Bao	6e06cb76ff	[JIT] Initialize CUDA context before launching fused kernel (#65064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65064 The problem appears when nvfuser is triggered from LazyTensor. Because LT maintains its own thread pool, the thread used for the first-time compilation does CUDA context initialization properly, but later cached execution may use a different thread which does not have a proper CUDA context. Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31269691 Pulled By: desertfire fbshipit-source-id: 384362025c087d61e8b625ff938379df283ef8b2	2021-10-05 16:01:59 -07:00
Mike Iovine	a5e6b2b2e3	[Static Runtime] Add variadic sigrid_transforms_torch_bind (#63960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63960 Reviewed By: hlu1 Differential Revision: D30529880 fbshipit-source-id: 1c4be2f9c0944bbe1e1c146989588c96bfd14eda	2021-10-05 16:00:36 -07:00
Dhruv Matani	e7747795c9	[PyTorch Edge] Reduce dispatch table size further for a trimmed build (#66112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66112 Eliminate Metal and Vulkan Dispatch Keys. Test Plan: Build + Sandcastle Differential Revision: D31298307 fbshipit-source-id: 31302fc626382db7997e5058750fa85458c9cbc1	2021-10-05 15:24:07 -07:00
Michael Suo	a3bbaf227c	Revert D31227448: [pytorch][PR] fixing sorting in stride indices Test Plan: revert-hammer Differential Revision: D31227448 (`da0e29edd4`) Original commit changeset: 51e3cd903757 fbshipit-source-id: a752a4df70281aa0eaaeb1afdd88395b08276da8	2021-10-05 14:28:34 -07:00
Michael Suo	89b56d630d	Create CI sev template (#66163 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66163 Reviewed By: seemethere Differential Revision: D31407988 Pulled By: suo fbshipit-source-id: a23b6fc5410ef1f901e2a7aacc2e0c17cb04d083	2021-10-05 13:55:07 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Nikita Shulga	588c1787ba	Update link to example pytorch/examples (#66095 ) Summary: `https://github.com/goldsborough/examples/tree/cpp/cpp` -> `https://github.com/pytorch/examples/tree/master/cpp` As C++ examples in https://github.com/pytorch/examples are more update Partially addresses https://github.com/pytorch/pytorch/issues/65388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66095 Reviewed By: janeyx99 Differential Revision: D31382888 Pulled By: malfet fbshipit-source-id: 8884c7795386249dea07cbe66783fa1dd963e07c	2021-10-05 12:48:12 -07:00
jiej	da0e29edd4	fixing sorting in stride indices (#63940 ) Summary: Updating `computeStrideProps` logic to break ties on stride_indices. For two dimension with identical stride, the dimension with size-1 should be considered as the faster dimension. Otherwise, its stride should be the product of existing stride and the size of the other dimension. Note that there's still inconsistency between eager memory_format and stride_properties in JIT, this is a design issue due to the ambiguity on size-1 stride. One example showing this failing test has been disabled in the added cpp test Pull Request resolved: https://github.com/pytorch/pytorch/pull/63940 Reviewed By: albanD Differential Revision: D31227448 Pulled By: dzhulgakov fbshipit-source-id: 51e3cd903757bef55d3158c057f9444d0cff7d2a	2021-10-05 12:30:41 -07:00
Zafar	0d020effab	[quant] Fix the parts that were missing after initial migration (#66058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058 After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change. This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location. This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace. Test Plan: `python test/test_quantization.py` Reviewed By: vkuzo Differential Revision: D31366066 Pulled By: z-a-f fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5	2021-10-05 11:45:37 -07:00
Zafar	727576e501	[quant] Fixing the hypothesis test for topk (#66057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66057 The current test is creating the sets that are too slow. This will cause either "Filtering too much" or "Timeout" errors in the future versions of hypothesis. This PR preemptively fixes the issue. Test Plan: `python test/test_quantization.py` Reviewed By: vkuzo Differential Revision: D31366065 Pulled By: z-a-f fbshipit-source-id: deaab4da8ee02a5dee8943cabdd30fc53d894a34	2021-10-05 11:43:56 -07:00
Michael Suo	92d0b7e99c	[deploy] fix typo in `registerModuleSource` (#66107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66107 lol Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D31385631 Pulled By: suo fbshipit-source-id: a3307e2862f7951c160776eb8edb18329c937ed1	2021-10-05 11:15:35 -07:00
Supriya Rao	458a00bacb	Back out "[quant] update fused_obs_fake_quant op to accept output_fake_quant argument" (#66063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66063 Original commit changeset: bffe776216d0 Test Plan: CI Reviewed By: vkuzo Differential Revision: D31347042 fbshipit-source-id: f56f628dc4690187bf284a8f2fda4c6aae10c1d6	2021-10-05 11:02:54 -07:00
John Shen	2b39b80971	[quantized] Replace conv_p with convolution_op in qnnpack (#65783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65783 convolution_op makes conv_param struct redundant, since it contains all the params of conv_param and more. We don't need to pass both structs to qnnpack or hold both in the packed weights, let's just hold convolution_op. This makes it easier to implement 3dconv since we won't have to template two structs. The conv_param struct is left in existence since tests rely on it to set up the convolution. ghstack-source-id: 139479651 (Note: this ignores all push blocking failures!) Test Plan: ci Reviewed By: kimishpatel Differential Revision: D30738727 fbshipit-source-id: e6d39644357b99d3b7491ae8a7066bf107eb8b9e	2021-10-05 11:01:26 -07:00
Peter Bell	bda3230b62	slow_conv2d grad_weight: call gemm directly (#65726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65726 This PR isn't strictly necessary since grad_weight doesn't use parallel_for. However, this does reduce the function overhead and will make it easier to parallelize in the future. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257877 Pulled By: ngimel fbshipit-source-id: d8ea97cc1f43d8d9dfff355ae27c9d982838b57e	2021-10-05 10:53:22 -07:00
Richard Barnes	1db78c30c9	Fix LLVM-12 concat_split_op.h error (#66060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66060 Fixes ``` testTumHistoryAdditionalLaser (caffe2.caffe2.fb.layers.tests.tum_history_test.TestTumHistory) ... caffe2/caffe2/operators/concat_split_op.h:363:74: runtime error: applying non-zero offset 8 to null pointer #0 0x7f8f39d29795 in caffe2::ConcatOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/concat_split_op.h:363 #1 0x7f8f39c4978d in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:987 #2 0x7f8f381fe9c9 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:67 #3 0x7f8f38ee488e in caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) caffe2/caffe2/core/workspace.cc:289 ``` Test Plan: Sandcastle Reviewed By: dzhulgakov, xush6528 Differential Revision: D31366205 fbshipit-source-id: 566aa519677c9d371189e4b1f81d595732861efc	2021-10-05 10:48:56 -07:00
Dhruv Matani	9c3eb50b7b	[PyTorch] Use std::move() in a couple places in function_schema_parser.cpp (#66114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66114 ghstack-source-id: 139712533 Test Plan: Build Reviewed By: swolchok Differential Revision: D31387502 fbshipit-source-id: e850cb7df397a7c5b31df995b23ad6e5c004ac86	2021-10-05 10:44:07 -07:00
Xiang Gao	aa80f05d2d	Remove sync in Embedding caused by unique (#66091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66091 Reviewed By: albanD Differential Revision: D31385576 Pulled By: ngimel fbshipit-source-id: e656d4d9c38b705c71853ca295f977d1cddc61a1	2021-10-05 09:39:42 -07:00
Nikita Shulga	1932bc69e9	Move GHA to ONNX (#65975 ) Summary: - Delete CircleCI ONNX config - Add sharded ONNX job to the list of generated workflows - Move ONNX runtime installation from `pytorch-job-specs.yml` to `.jenkins/caffe2/test.sh` - Limit MKLDNN to AVX2 ISA while running Caffe2 tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/65975 Reviewed By: seemethere Differential Revision: D31327206 Pulled By: malfet fbshipit-source-id: 15aa53e4481e846c62b4ee2db5c03047d68679a4	2021-10-05 09:31:57 -07:00
Stephen Jia	df475aa1dc	Update Vulkan runner in benchmark binary to handle non-tensor inputs (#66123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66123 Some models may take in a list of tensors as inputs, thus the bundled inputs will contain `IValues` that are of the type `c10::List`. For Vulkan models, every tensor in the `IValue` list has to be converted to a vulkan tensor first, and this case is not currently handled by the Vulkan model wrapper in the benchmark binary. This diff introduces `IValue` type checking to the input processor of the Vulkan model wrapper, and adds support for Tensor and List types. Test Plan: ``` # Build the binary cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-output # Push it to the device adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models # Run the benchmark binary BENCH_CMD="/data/local/tmp/compare_models" BENCH_CMD+=" --model=$PATH_TO_MODEL" BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL" BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE" BENCH_CMD+=" --iter=100" BENCH_CMD+=" --tolerance 1e-5" ``` Reviewed By: beback4u Differential Revision: D31276862 fbshipit-source-id: 1d9abf958963da6ecad641202f0458402bee5ced	2021-10-05 07:59:56 -07:00
Jerry Zhang	2a5116e159	[quant][fx2trt] Add quantize_per_channel in acc_ops and acc_ops_converter (#65287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65287 Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: 842974287 Differential Revision: D31038882 fbshipit-source-id: cd20e132ffa85f6fb070e21cd96a9e84dd15fab5	2021-10-05 02:12:00 -07:00
jjsjann123	d609957c95	patching graph_for (#55139 ) Summary: Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139 Reviewed By: albanD Differential Revision: D31330909 Pulled By: dzhulgakov fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724	2021-10-04 21:50:22 -07:00
Mike Iovine	ed50fa2513	[Static Runtime] Test isOptimizableContainerType and getAlwaysAliveValues (#65849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65849 Add tests for some of `StaticModule`'s exposed methods. Both of these are used by the memory planner, so it would be helpful to have some unit tests that ensure our basic invariants don't break. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31282901 fbshipit-source-id: e390329f4794e034170507e3a0de0abcfe0ab7b9	2021-10-04 20:46:07 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Yinghai Lu	6b0aa2958d	[FX] Support torch.layout as arg (#66048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66048 Previously, create_arg would fail if it encountered a not `None` layout argument. Adding it to `BaseArgumentTypes` list should be enough to fix that. Test Plan: Added unittest Reviewed By: jamesr66a Differential Revision: D31362662 fbshipit-source-id: 20049971e18c17e9c75e50540500c567266daa55	2021-10-04 19:58:08 -07:00
Jerry Zhang	6ea4902cf4	[ao_migration] torch.quantization --> torch.ao.quantization in caffe2/torch/fx (#66096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66096 codemod -m -d caffe2/torch/fx --extensions py \ 'torch.quantization' \ 'torch.ao.quantization' Test Plan: test_in_prod Reviewed By: z-a-f Differential Revision: D31294195 fbshipit-source-id: 00425844f8160749f68bdbdf0e08cb22c79099c9	2021-10-04 19:57:01 -07:00
n-v-k	de24faec5f	Binary building wthout python fix (#66031 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031 Reviewed By: VitalyFedyunin Differential Revision: D31356243 Pulled By: malfet fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd	2021-10-04 18:34:35 -07:00
Nikita Shulga	6eb3a1c831	Run master clang-tidy on PRs (#66104 ) Summary: Make PR clang-tidy a strong superset of master one Should prevent a situation when [clang-tidy on PR](https://github.com/pytorch/pytorch/runs/3773346094) was clean but regressed on [trunk commit](https://github.com/pytorch/pytorch/runs/3773406183?check_suite_focus=true) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66104 Reviewed By: seemethere Differential Revision: D31384608 Pulled By: malfet fbshipit-source-id: 397319be3480520d58eab11ec001ad7a9a94d41c	2021-10-04 18:27:38 -07:00
Scott Wolchok	7c758759e3	[PyTorch Edge] Avoid string copying in TypeParser (#64278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64278 Use c10::string_view and const char* to copy less. ghstack-source-id: 139468089 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/132239033718036 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/132239033718036 went from mean of 293 ms to 281 ms. Reviewed By: dhruvbird Differential Revision: D30650712 fbshipit-source-id: abad143f2d5cc99a30e8da376c8e37716373032a	2021-10-04 16:10:38 -07:00
Jane Xu	69da4b4381	GHA: make obvious when we are running smoke tests to user (#66011 ) Summary: This PR clarifies what's run on PRs by explicitly stating when it runs smoke tests for windows CUDA and makes the logic so that user defined labels override other workflow logic. 1. Move smoke tests to its own config. 2. Make sure that when a user specifies a ciflow label that is not the default, the workflow runs as if it is on trunk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66011 Test Plan: the default on PRs would generate this matrix (default replaced by smoke_tests) ![image](https://user-images.githubusercontent.com/31798555/135672182-64454ea3-ff43-4746-b8e4-09b0b28e9d33.png) But when retriggered with a label, it looks like (note that there's no smoke_tests config): ![image](https://user-images.githubusercontent.com/31798555/135672601-5aa9a268-bc76-40f1-80c6-62b3fac6601d.png) Reviewed By: VitalyFedyunin, seemethere Differential Revision: D31355130 Pulled By: janeyx99 fbshipit-source-id: fed58ade4235b58176e1d1a24101aea0bea83aa4	2021-10-04 07:53:17 -07:00
soulitzer	4cdfceddd2	[Reland] Avoid saving self for `softmax` and `log_softmax` (#66018 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/65242 The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018 Reviewed By: albanD Differential Revision: D31348822 Pulled By: soulitzer fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a	2021-10-03 21:35:01 -07:00
soulitzer	8f5631b859	Refactor functional api vectorized jacobian to use batched grad parameter (#65566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65566 This doesn't simplify vectorized jacobian computation, but is good to consolidate logic and helps us to test the logic Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31236257 Pulled By: soulitzer fbshipit-source-id: 00ca0aa6519bed5f9ee2c7be4daa8872af5e92cd	2021-10-03 19:55:08 -07:00
soulitzer	73901b099d	Add batched_grad parameter to `autograd.grad` (#65564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564 - wrap the call into engine with vmap if `batched_grad` is `True` - improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659) - borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature - adds basic test (further testing is done when we replace the usage in vectorized jacobian computation) TODO: - create an issue tracking this Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31236259 Pulled By: soulitzer fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7	2021-10-03 19:55:06 -07:00
soulitzer	b6d5f1ee70	Allow None to pass through for vmap (#65565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65565 Does jax allow this? Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31236258 Pulled By: soulitzer fbshipit-source-id: 80460b355fc32ecbba8151e1f3179f076a927f9d	2021-10-03 19:53:49 -07:00
Don Jang	89ed9bdaee	[Static Runtime] Fix bug of creating output aliases in aten::embedding_bag (#65516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65516 This change fixes a bug that Static Runtime's `aten::embedding_bag` out variant implementation creates aliases in its managed output tensors. Managed output tensors should never be an alias with each other since writing to them can illegally overwrite others' contents unintentionally, and this exact problem was causing the bug at T97393697, causing SR to return wrong return values. This bug is detected in inline_cvr/remote_ro by a DCHECK, `verify_no_memory_overlap` (introduced by D30211705 (`3fb33b38b9`)), but wasn't found so far since our testing didn't include running the model in the debug mode. Fortunately this bug is not hitting production since the aliases outputs are not used in production. This change fixes the root cause from `_embedding_bag_cpu_impl_out` by replacing alias creation with copying. Note that this change also includes a fundamental change in Static Runtime's unit testing: `testStaticRuntime` exercises the given graph 3 times: 1. profile run 2. run using the profile to allocate managed tensors 3. reuse the managed tensors -- newly added Adding 3 reveals this bug with a new unittest `EmbeddingBagWithManagedOutput`. Test Plan: - Confirmed that the crash experienced by `StaticRuntime.EmbeddingBagWithManagedOutput` disappears with this change (crash paste: P459807248). - Added `StaticRuntime.EmbeddingBagWithManagedOutput` to detect the same problem in the future. Reviewed By: hlu1 Differential Revision: D31104345 fbshipit-source-id: 7bddf9cd82b400d18d8ce1bf15e29b815ef9ba8f	2021-10-03 15:10:58 -07:00
Richard Barnes	40948a935d	Fix LLVM-12 UB in generate_proposals_op.cc (#66009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66009 Fixes ``` test_trace_c10_ops (jit.test_tracer.TestTracer) ... third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374:24: runtime error: applying non-zero offset 4 to null pointer #0 0x7f5228f72227 in Eigen::internal::BlockImpl_dense<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, true>::BlockImpl_dense(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374 #1 0x7f5228f7212c in Eigen::BlockImpl<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, Eigen::Dense>::BlockImpl(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:166 #2 0x7f5228f720dc in Eigen::Block<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false>::Block(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:142 #3 0x7f5229b0e059 in Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::FixedBlockXpr<internal::get_fixed_value<int>::value, internal::get_fixed_value<long>::value>::Type Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::block<int, long>(long, long, int, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/../plugins/BlockMethods.h:98 #4 0x7f5229b0c5ca in caffe2::GenerateProposalsOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/generate_proposals_op.cc:348 ``` Also cleans up some data type and const issues around the area. Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D31343046 fbshipit-source-id: fd9096c8e47a0aad529c72fd313f64ca98dcb80b	2021-10-03 12:50:21 -07:00
Prabhat Roy	c7748fc172	Added validation of mode parameter in AveragedModel (#65921 ) Summary: Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3	2021-10-03 08:42:28 -07:00
Hector Yuen	0fc6bd2e47	[gpu ne eval] disable adam decay unit test for gpu (#66056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66056 keep running into this unrelated failure when landing diffs regarding the gpu inference project, disabling this operator unit test in gpu because it doesn't exist RuntimeError: [enforce fail at operator.cc:277] op. Cannot create operator of type 'SmartDecaySparseAdam' on the device 'CUDA'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "param" input: "mom1" input: "mom2" input: "last_seen" input: "indices" input: "grad" input: "lr" input: "iter" output: "param" output: "mom1" output: "mom2" output: "last_seen" name: "" type: "SmartDecaySparseAdam" arg { name: "beta1" f: 0 } arg { name: "beta2" f: 0.9 } arg { name: "epsilon" f: 1e-05 } device_option { device_type: 1 } https://www.internalfb.com/intern/testinfra/diagnostics/5910974579962988.562949996565057.1633122845/ Test Plan: sandcastle Reviewed By: jianyuh Differential Revision: D31364731 fbshipit-source-id: 7fbd994cbe7f6ca116f5f34506a1ed7f14759bdf	2021-10-03 07:40:23 -07:00
Nirav Mehta	29c0725e8a	Back out "[caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error" (#66055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66055 Original commit changeset: c31f179f8a7d Reviewed By: igorsugak Differential Revision: D31353348 fbshipit-source-id: 73d928e5c938ba604a7f9ea17a6250b57306e88f	2021-10-02 16:46:26 -07:00
Nikolay Korovaiko	7c52963350	[WIP] skip constant folding dequant node (#63991 ) Summary: This PR makes Constant Propagation to ignore dequant nodes. https://github.com/pytorch/pytorch/issues/61092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63991 Reviewed By: pbelevich Differential Revision: D31363993 Pulled By: Krovatkin fbshipit-source-id: 99f7c56a4381aff2cbdf1167508414cf240e9f75	2021-10-02 15:30:43 -07:00
Yinghai Lu	8a307640db	selective trt import based whether we have gpu or not (#66045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66045 Att. Reviewed By: kflu Differential Revision: D31357388 fbshipit-source-id: 601affe067e5e4c1f1516dff4ac84fa9cdd27d5e	2021-10-02 06:12:37 -07:00
Chen Lai	8b8012a165	[PyTorch Edge] Skip writing version during backport (#65842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65842 During backport, only parts of the model (like bytecode.pkl) needs to be re-written, while the rest of the model is the same. However, `version` will always be re-written when `PyTorchStreamWriter` is destrcuted. Change version to optional and add an api to allow skipping writing version when closing the writer. ghstack-source-id: 139580386 Test Plan: buck run papaya/scripts/repro:save_load Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D31262904 fbshipit-source-id: 3b8a5e1aaa610ffb0fe8a616d9ad9d0987c03f23	2021-10-01 21:18:31 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	5ef350d7cc	Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954 Test Plan: revert-hammer Differential Revision: D31359010 (`c269f471f4`) Original commit changeset: dce4b91a9891 fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f	2021-10-01 20:35:35 -07:00
Nikita Shulga	c269f471f4	Fix cang-tidy regressions caused by #65954 (#66040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040 Reviewed By: ZolotukhinM Differential Revision: D31359010 Pulled By: malfet fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158	2021-10-01 19:50:53 -07:00
lezcano	ca76e193a3	Fix nll_backward for negative weights (#64572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64572 Fixes https://github.com/pytorch/pytorch/issues/64256 It also fixes an inconsistent treatment of the case `reduction = "mean"` when the whole target is equal to `ignore_index`. It now returns `NaN` in this case, consistently with what it returns when computing the mean over an empty tensor. We add tests for all these cases. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31116297 Pulled By: albanD fbshipit-source-id: cc44e79205f5eeabf1efd7d32fe61e26ba701b52	2021-10-01 19:41:51 -07:00
Karol Kosik	eb3b9fe719	[XROS][ML] System specific adjustments for UTs to work. (#65245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245 Building and running c10 and qnnpack tests on XROS. Notable changes: - Adding #if define(_XROS_) in few places not supported by XROS - Changing Threadpool to abstract class ghstack-source-id: 139513579 Test Plan: Run c10 and qnnpack tests on XROS. Reviewed By: veselinp, iseeyuan Differential Revision: D30137333 fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1	2021-10-01 18:15:14 -07:00
Samuel Salas	363ccb257d	GELU acc OP (#65957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65957 added accelerator ops and unit test for GELU. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31277083 fbshipit-source-id: f66dd05ef574db58cfa599e3575f95f1ebe82e93	2021-10-01 17:49:53 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Janet Yang	10f6294281	Fix shape inference dim_type for Clip, Mean, Div (#65996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65996 Test Plan: Facebook ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Clip ``` ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Div ``` ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Mean ``` Reviewed By: yinghai Differential Revision: D31121298 fbshipit-source-id: f366d8f4d4d0be159b62bfaafc42ca924c05e022	2021-10-01 17:34:34 -07:00
David Reiss	e1d963e8fc	model_dump: Fix memory computation when both constants and data tensors are present (#66006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66006 Previously, this was resulting in a key collision and a crash. ghstack-source-id: 139342089 Test Plan: Ran webdriver test locally. Reviewed By: dhruvbird Differential Revision: D31281092 fbshipit-source-id: f31311726c681d6d7e0504ff8e84c888af9054f0	2021-10-01 16:31:06 -07:00
David Reiss	23caeb3f71	model_dump: Add a helper to produce html with a single call (#66005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66005 ghstack-source-id: 139342091 Test Plan: Unit test, and used in a notebook. Reviewed By: dhruvbird Differential Revision: D31281091 fbshipit-source-id: 1e4d0713b9796a3d182de9e676c3b3c3b1610d6e	2021-10-01 16:29:43 -07:00
driazati	d9a95e66f0	Upload test failures to RDS (#65873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65873 Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31296520 Pulled By: driazati fbshipit-source-id: 0bd3fb6b62e49c7177199001fda0e7b124a22ab2	2021-10-01 16:25:51 -07:00
Shiyan Deng	f85d7422bb	[fx2trt]add support for torch.tile (#66016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66016 Add acc_ops.tile and converter for it. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_tile Reviewed By: wushirong Differential Revision: D30587939 fbshipit-source-id: 1e2613cfca486fe54fcc0d38e5c7cdeb7d0ed4a0	2021-10-01 16:06:09 -07:00
Erjia Guan	060e41eafa	Forward fix type hint for DataLoader (#66001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31340565 Pulled By: ejguan fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3	2021-10-01 15:48:45 -07:00
Michael Suo	ad889d0b5e	Revert D30634700: [pytorch][PR] Fix typo in tensor docs Test Plan: revert-hammer Differential Revision: D30634700 (`d937473709`) Original commit changeset: e8952be20966 fbshipit-source-id: b18694e332023abcdf17ec1900b81b00d21f1014	2021-10-01 15:23:38 -07:00
Alex Beloi	7d22007902	[fx-acc] add acc_op optimization flags and decorator (#65928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65928 This diff adds a decorator for adding flags to acc_ops. These flags inform graph optimizations that the op is eligible for optimization by some general criteria (e.g. op acts elementwise, op does quantization). This makes it simpler to expand acc_ops. The user can add an op and add flags to enable optimization without going through all graph opts and trying to determine if new acc_op is eligible for the graph optimization. Even though our list of graph opts is small now we already see that for `sink_reshape_ops` we had hardcoded 11 pointwise acc_ops, now there are 24 pointwise acc_ops. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts:test_fx_sink ``` ``` Parsing buck files: finished in 0.5 sec Downloaded 0/3 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 37.1 sec (100%) 10279/10279 jobs, 3/10279 updated Total time: 37.7 sec More details at https://www.internalfb.com/intern/buck/build/e13521bb-6142-4960-8cdd-6b5e4780da96 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 16260a2a-d364-4605-9111-6f2a19317036 Trace available for this run at /tmp/tpx-20210922-124332.623880/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_sink - main (6.038) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.036) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_reshape_nodes (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.058) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.057) Summary Pass: 4 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564 ``` Reviewed By: jfix71 Differential Revision: D31121321 fbshipit-source-id: 6f6e3b8e2d57ea30766fa6bee34ca207cec86f0f	2021-10-01 15:19:35 -07:00
Akshit Khurana	d937473709	Fix typo in tensor docs (#64160 ) Summary: Remove extra character from `torch.qfint32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64160 Test Plan: Docs Reviewed By: jerryzh168 Differential Revision: D30634700 Pulled By: axitkhurana fbshipit-source-id: e8952be20966b9a3f9d62d9957ae255d5d4889bb	2021-10-01 14:57:55 -07:00
David Riazati	8e8695285f	Re-generate workflows (#66027 ) Summary: Fix master breakage Pull Request resolved: https://github.com/pytorch/pytorch/pull/66027 Reviewed By: suo, malfet Differential Revision: D31353922 Pulled By: driazati fbshipit-source-id: cdb7f639608999b6ee72f6b1000d7ecbc02efc95	2021-10-01 14:56:51 -07:00
driazati	894d296bae	Remove usage of GitHub's artifact store in linux jobs (#65875 ) Summary: The docs stuff is unnecessary since they are hosted in S3 anyways, and the reports are mirrored in S3 which has better upload/download speed and is available as soon as the upload is done rather than once the workflow is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65875 Reviewed By: seemethere Differential Revision: D31296500 Pulled By: driazati fbshipit-source-id: 8c371230d0c8c0eb785702df9ae495de85f60afa	2021-10-01 13:49:44 -07:00
Bert Maher	6e8ffd191e	Fix typo in name of LayerNormBackwardCUDAKernel (#66000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66000 Saw this in nvprof and I'm just a little too nitpicky to let it slide! ghstack-source-id: 139547271 Test Plan: CI Reviewed By: xiaomengy Differential Revision: D31340262 fbshipit-source-id: ab48dc99c34a74585e66800b4bbcccc6aabbaff2	2021-10-01 12:28:59 -07:00
Scott Wolchok	ffede499b2	[PyTorch][Static Runtime] Fast path for contiguous to_copy (#65499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65499 When the tensors in question are contiguous, there is no need to go through dispatch, use TensorIterator, etc. ghstack-source-id: 139549027 Test Plan: Ran ptvsc2_predictor_bench for ctr_mobile_feed local net following https://fb.quip.com/q8hBAFGMeaOU (but without the profile and compare_results options). Before: I0922 14:00:32.261942 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.18124. Iters per second: 139.252 I0922 14:01:44.865965 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.25314. Iters per second: 137.871 I0922 14:02:56.929602 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.1986. Iters per second: 138.916 I0922 14:04:05.923025 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.89211. Iters per second: 145.093 I0922 14:05:17.953056 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.19577. Iters per second: 138.971 mean: 7.144172, stddev: 0.1283 After: I0922 13:51:55.233937 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.79709. Iters per second: 147.122 I0922 13:53:03.062682 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.77605. Iters per second: 147.579 I0922 13:54:10.230386 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.70993. Iters per second: 149.033 I0922 13:55:18.403434 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.81044. Iters per second: 146.833 I0922 13:56:26.568646 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.80965. Iters per second: 146.85 mean: 6.800632, stddev: 0.013227 Looks like about a 5.3% improvement. Reviewed By: hlu1 Differential Revision: D31125492 fbshipit-source-id: 92ab5af242d0a84dcf865323a57b48e8374eb823	2021-10-01 12:13:33 -07:00
Scott Wolchok	7b10a76e05	[PyTorch] Try removing Android strtod implementation (#65713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65713 This may not be needed anymore. ghstack-source-id: 139114284 Test Plan: see if it builds Reviewed By: dhruvbird Differential Revision: D31216245 fbshipit-source-id: 29c9c013f94070c7713e46027881cb693b144d36	2021-10-01 11:43:15 -07:00
Scott Wolchok	176d3c6fb4	[PyTorch] Fix many Tuple::elements() callsites (#64065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065 It is only safe to mutate Tuple elements if you are the sole owner of the tuple. The most efficient way to do this, then, is `std::move(*std::move(tupleIValue).toTuple()).elements()` (the innermost move allows `IValue::toTuple()` to avoid a refcount bump and the outermost move allows the element vector to be moved out of the tuple), but many callsites write simply `tupleIValue.toTuple().elements()`, which incurs many extra refcount bumps. ghstack-source-id: 139468088 Test Plan: CI Reviewed By: ezyang Differential Revision: D30592621 fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09	2021-10-01 11:36:05 -07:00
Shiyan Deng	f14e5e636d	[fx2trt]fix slice tensor converter (#65960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65960 Fix a bug in the converter and add support for negative dim. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_narrow Reviewed By: wushirong Differential Revision: D31310232 fbshipit-source-id: 62887369d830202cae6d63b41747225b12dcf754	2021-10-01 11:29:42 -07:00
Scott Wolchok	21eebc9fd6	[PyTorch][easy] Use copy-and-move instead of copy-and-swap in IValue::operator= (#65826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65826 Should be marginally more efficient. ghstack-source-id: 139315050 Test Plan: CI Reviewed By: ezyang Differential Revision: D31272489 fbshipit-source-id: 7c309d67a0ec0ada35a5b62497bac374538394a9	2021-10-01 11:16:42 -07:00
Jordan Fix	592481a5cc	[fx][const_fold] Refactor to use base split module to simplify, and correctly handle non-single-Tensor outputs (#65933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65933 We use `split_module` to split the input model that we want to const fold into const and non-const subgraphs. Previously we were taking the non-const graph and trying to hack it back into the same signature as the input model. However this was complex/buggy. Instead, refactor to just keep using the base split module that contains both const and non-const graphs. This means we: - Inline the non-const graph into the split module - Remove the const graph from the module and replace it with a getattr that will be run to insert that attr when we `run_folding` Test Plan: Added test coverage to cover newly supported folding, and updated other tests for new strategy. Reviewed By: yinghai Differential Revision: D31293307 fbshipit-source-id: 6e283a8c7222cf07b14e30e74dffc8ae5ee8b55f	2021-10-01 10:26:29 -07:00
Tao Xu	34682377b9	[iOS][CI] Update dev certs (#66004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004 Reviewed By: xta0 Differential Revision: D31340893 Pulled By: malfet fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260	2021-10-01 09:38:49 -07:00
Michael Suo	ccf8d48f16	Revert D31317680: [pytorch][PR] Avoid saving self for`softmax` and `log_softmax` Test Plan: revert-hammer Differential Revision: D31317680 (`5f7cadc7aa`) Original commit changeset: b3b921e06775 fbshipit-source-id: 1bca0672383536a2c21243ceb52349c766a94344	2021-10-01 09:31:44 -07:00
Michael Suo	21da6ae9ce	suppress mypy error (#66003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003 Differential Revision: D31340874 D31340874 Test Plan: Imported from OSS Reviewed By: seemethere Pulled By: suo fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2	2021-10-01 09:17:42 -07:00
Brian Hirsh	eac218dbc6	Revert "Port `sort` kernel to structured kernels. (#62391 )" (#65876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65876 This reverts commit 93852bb2d41d90b6ac660015d79f7474bcebb774. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31296329 Pulled By: bdhirsh fbshipit-source-id: 85eae72f2346d69290f440f5393a7da096a96c6e	2021-10-01 07:50:28 -07:00
soulitzer	5f7cadc7aa	Avoid saving self for`softmax` and `log_softmax` (#65242 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64000 - updates double backward formula to compute grad wrt output instead of self - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242 Reviewed By: malfet Differential Revision: D31317680 Pulled By: soulitzer fbshipit-source-id: b3b921e06775cfc12e5a97a9ee8d73aec3aac7c3	2021-10-01 07:49:07 -07:00
Ivan Yashchuk	383c0a3858	Fix internal assert failure for torch.all and torch.any with requires_grad=True (#65714 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/58547. I added an OpInfo-based test that fails on master and passes with the proposed changes. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/65714 Reviewed By: saketh-are, mruberry Differential Revision: D31248307 Pulled By: albanD fbshipit-source-id: 041eaa9b744c3043f78dd8ae5f457f67c311df4f	2021-10-01 07:32:44 -07:00
Ivan Yashchuk	53c0d91db9	Make autograd codegen for differentiable outputs safer to use (#65823 ) Summary: This PR adds raising an error when `len(output_differentiability) != len(outputs)` Notes in derivatives.yml tell that > 'output_differentiability' and value a list of the same length as the number of outputs from the forward function. but it was not enforced in codegen leading to confusion and unexpected bugs https://github.com/pytorch/pytorch/issues/65061#issuecomment-930271126. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65823 Reviewed By: mrshenli Differential Revision: D31307312 Pulled By: albanD fbshipit-source-id: caeb949e9249310dffd237e77871e6d0d784e298	2021-10-01 07:27:57 -07:00
Bert Maher	bff8d8fd28	[nnc] Add BufHandle.store to python API (#65213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31328502 Pulled By: bertmaher fbshipit-source-id: 1f260f68692c3859350587afe021a500672d79f0	2021-10-01 06:59:50 -07:00
Bert Maher	8cf047afac	[nnc] Add call_with_numel interface for fast CUDA calls (#65213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31319012 Pulled By: bertmaher fbshipit-source-id: 93fee80f956795470f5a2ce3b33c2ea2f132036f	2021-10-01 06:58:37 -07:00
Bert Maher	8595b6eeed	Avoid UB when indexing into size-0 tensors (#65878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65878 If we attempt to compute an offset into an empty tensor we trigger UB, since we'd be adding an offset to a nullptr, which is UB (https://reviews.llvm.org/D67122) even if we never use the pointer. Since indexing into an empty tensor yields an empty tensor anyways, let's just return the underlying (null) data ptr in this case. ghstack-source-id: 139448496 Test Plan: r-barnes originally pointed this out to me in a failing TE fuser test: https://www.internalfb.com/intern/testinfra/diagnostics/5910974579561425.281475022329152.1632898053/ ``` buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU)' ``` But it turns out it's easily triggered by anything that tries to operate on a slice of a size-0 tensor: ``` def test_pad(self): F.pad(torch.ones(0, 3, 3), (1, 2), 'circular') def test_index(self): input = torch.zeros(0, 3, 3) out = torch.zeros(0, 3, 6) out[..., 1:4] = input[..., 0:3] def test_add(self): torch.ones(0, 2)[:, 1] + torch.ones(0, 1) ``` What's the right place for these sort of operator corner-case tests? Should they be/are they part of OpInfo? Reviewed By: jamesr66a Differential Revision: D31296914 fbshipit-source-id: 0ef52ad311dceeed985498f8d9390bc6fbaefbfc	2021-10-01 06:55:15 -07:00
Roman Shapovalov	fc52f1293e	Improve pytorch type hints (Dataloader, trig functions) Summary: This is to fix Pyre errors in our applications: * calling `tensor.cos()` etc. * creating a data loader with batch sampler that is `List[List[int]]`. Test Plan: TODO: rebase the diffs and run Pyre. Reviewed By: ejguan Differential Revision: D31309564 fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745	2021-10-01 06:53:57 -07:00
Mike Iovine	982ef8837b	[Static Runtime] Fuse ListUnpack + gather_ranges_to_dense (#65116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65116 Fuse `fb::gather_ranges_to_dense` with `prim::ListUnpack`. ``` %0 : Tensor[] = fb::gather_ranges_to_dense(...) %1: Tensor, %2: Tensor, ... = prim::ListUnpack(%0) ``` turns into: ``` %0: Tensor, %1: Tensor, ... = fb::gather_ranges_to_dense(...) ``` Reviewed By: hlu1 Differential Revision: D30973525 fbshipit-source-id: f0349baa1622b697ee2ab652376a24ec0d89e819	2021-10-01 06:49:54 -07:00
Vasiliy Kuznetsov	227e37dd39	pytorch quantization ao migration phase 2: caffe2/test (#65832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65832 Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/test` folder. ``` find caffe2/test/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" HG: manually revert the files testing this migration hg revert caffe2/test/quantization/ao_migration/common.py hg revert caffe2/test/quantization/ao_migration/test_ao_migration.py ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31275754 fbshipit-source-id: 4ed54a74525634feb0f47a26d071102e19c30049	2021-10-01 06:26:30 -07:00
Vasiliy Kuznetsov	dac35b3592	pytorch quantization ao migration phase 2: torch/jit (#65829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65829 Renames `torch.quantization` to `torch.ao.quantization` in `torch/jit` folder. ``` find caffe2/torch/jit/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31273365 fbshipit-source-id: 350eb116148d91b967d428b54413caee4fd68438	2021-10-01 06:22:22 -07:00
Vasiliy Kuznetsov	e3af4be963	pytorch quantization ao migration phase 2: caffe2/benchmark (#65833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65833 Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/benchmarks` folder. ``` find caffe2/benchmarks/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31275963 fbshipit-source-id: 8596bf28df5c3ad2c4490ac8abb285d6517c0116	2021-10-01 06:17:36 -07:00
kshitij12345	c1447f06a8	[special] special alias for softmax (#62251 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62251 Reviewed By: H-Huang Differential Revision: D31141834 Pulled By: mruberry fbshipit-source-id: aecaf62af248e9034ef589159ce0fb325c729493	2021-10-01 03:55:32 -07:00
Zafar	c27b427cd9	[sparsity] Add m-out-of-n support in the WeightNormSparsifier (#65295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295 The m-out-of-n is implemented as follows: 1. Compute the blocks that need to be sparsified using the weight-norm criterion 2. Within each block below the threshold find the smallest absolute value elements 3. Zero out only the smallest values within each block m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out. Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset. This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835, as well as meets the support of the nVidia cusparselt requirements. To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0. That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31186828 Pulled By: z-a-f fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618	2021-10-01 03:19:15 -07:00
Zafar	8b1aa85388	[sparsity] Change API to take FQNs as configuration (#65296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65296 The original API described in the https://github.com/pytorch/pytorch/issues/59835 assumed that the per-layer configuration would take a module/layer reference. However, a more useful approach is to refer to the layers by their fully qualified names (FQN). That allows us to store the configuration in a file without serializing the models. We define a layer's FQN as it's "path" within a model. For example, if one can refer to a model using `model.layer0.sublayerX`, the FQN of the sublayerX is `'layer0.sublayerX'`. Test Plan: ``` python test/test_ao_sparsity.py -- TestBaseSparsifier buck test mode/opt //caffe2:test -- TestBaseSparsifier ``` Reviewed By: gchanan Differential Revision: D31186830 Pulled By: z-a-f fbshipit-source-id: d8d87f1c054e5c10d470e67837476a11e0a9b1d4	2021-10-01 03:17:31 -07:00
Dhruv Matani	ea0de37d2e	[PyTorch] Avoid string construction from const char* and speedup empty string creation if error messages are suppressed (#65939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65939 This change includes 2 separate optimizations. 1. Provide an overload of `debugString(const char, ...)` in addition to `debugString(std::string, ...)` for cases where `const char` is passed in to avoid `std::string` construction in cases where `STRIP_ERROR_MESSAGES` is also defined and the caller is passing in a `const char*` 2. Return `std::string("", 0)` instead of `""` since the former triggers no call to `std::basic_string`'s constructor whereas the latter does. [Godbolt Link](https://godbolt.org/z/oTExed5h8). However, I'm surprosed by this since the man page for [std::basic_string](https://en.cppreference.com/w/cpp/string/basic_string/basic_string) clearly states that the constexpr overload is since C++20, and I am building using `-Os -std=c++17` Godbolt Screenshot: {F667311023} ghstack-source-id: 139507542 Test Plan: CI and local build via: ``` buck build //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: swolchok Differential Revision: D31312942 fbshipit-source-id: aa24abbfe1c16419f235d037595321982614c5ea	2021-10-01 00:17:21 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Michael Suo	33c03cb61a	[deploy][1/n] Make deploy code conform to PyTorch style. (#65861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861 First in a series. This PR changes the code in deploy.h/cpp and interpreter_impl.h/cpp to be camel case instead of snake case. Starting with this as it has the most impact on downstream users. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D31291183 Pulled By: suo fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934	2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin	765b6a90f3	[TensorExpr] Move lowerings registration from kernel.cpp to lowerings.cpp. (#65553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65553 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148921 Pulled By: ZolotukhinM fbshipit-source-id: 772062155043d4be9e9a25f6259b8e4a6cb762f4	2021-09-30 22:56:22 -07:00
Mikhail Zolotukhin	015e0079e3	[TensorExpr] Move 'compute' functions to operators/... (#65552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65552 This PR is mostly a verbatim move of several functions to different files. The goal is to have more consistency in what resides where. With this PR: All `compute` functions defining how a given operator needs to be lowered to TE IR will reside in `operators/.{cpp,h}`. * Auxiliary functions for these functions will reside in `operators/misc.cpp`. `compute` functions for ops not belonging anywhere else can also go to that file. `operators/unary.` is renamed to `operators/pointwise.` and now includes functions like `computeTwoOperands`. * `kernel.` now contains only JIT-related* logic and implementations of `TensorExprKernel` methods. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148923 Pulled By: ZolotukhinM fbshipit-source-id: e36ad8e779b8d30a33b49ea4ebf6d6a7438989f4	2021-09-30 22:56:20 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Mikhail Zolotukhin	eee9ad0fdd	[TensorExpr] Add a skeleton for a registry of NNC lowerings. (#65550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65550 This PR adds the source files and the class for the registry, subsequent PRs actually port existing lowerings to this mechanism. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31148922 Pulled By: ZolotukhinM fbshipit-source-id: 4c087b22ee898d5a5a18a5d2a4bb795aa2ffd655	2021-09-30 22:56:16 -07:00
Mikhail Zolotukhin	d84191fcc6	[TensorExpr] Kernel: make prim::ConstantChunk handled like other ops. (#65549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65549 Previously it had a special handling, with this change it follows the same mechanism as other ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148924 Pulled By: ZolotukhinM fbshipit-source-id: 572d8ae5e123e7a0e2a656154d7bd0f73c785a06	2021-09-30 22:55:00 -07:00
Hao Lu	a6ad2b41ac	[Static Runtime] Make module_ optional in StaticModule (#65882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65882 `torch::jit::Module` is refcounted. There is no need to wrap it in a `shared_ptr`. Test Plan: ``` buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: mikeiovine Differential Revision: D31012222 fbshipit-source-id: 74d234bd85423e5ba0e396f24899631354a2c74b	2021-09-30 22:48:49 -07:00
Peter Bell	08df4c2b3c	slow_conv2d grad_input: avoid dispatch in parallel region (#65725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65725 See gh-56794 Avoid dispatch inside of parallel_for by: 1. Replacing Tensor slicing with TensorAccessor 2. Call `grad_input.zero_()` only once, outside of the parallel region 3. Replace `at::mm` with a `gemm` call Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31257876 Pulled By: ngimel fbshipit-source-id: f2902edeccd161431c1dfb1ab3e165d039ec259d	2021-09-30 22:47:31 -07:00
Jiewen Tan	6502fb89dd	Make JIT Aliasing Test Less Brittle (#65493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493 Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation. Test Plan: python test/test_ops.py -k test_variant_consistency_jit Reviewed By: mrshenli Differential Revision: D31321221 Pulled By: alanwaketan fbshipit-source-id: f4a5cbfd36bd0867d8c1bf9de9a65365ee7c35d6	2021-09-30 22:43:03 -07:00
Linbin Yu	4f5ea5983a	[QPL] move metadata logging to markerEnd for model run QPL (#65451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65451 This diff moved metadata logging from marker start to marker end. This should improve perf because we can skip metadata logging when mark is not sampled (using isMarkerOn) Test Plan: Verified metadata are logged: https://fburl.com/scuba/qpl_metrics/pytorch_employee/armjgtyw https://fburl.com/scuba/qpl_metrics/pytorch_employee/zz36zkr1 Reviewed By: xcheng16 Differential Revision: D31105548 fbshipit-source-id: 0eafaaefecb7e230021616e397e548a2fd2b92e9	2021-09-30 22:12:40 -07:00
Igor Sugak	2481c06496	[caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error (#65506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65506 Test Plan: run a adfinder canary and verify this error is fixed. Reviewed By: swolchok Differential Revision: D31130083 fbshipit-source-id: c31f179f8a7de75ed6f6e7ee68b197f2970ddd3d	2021-09-30 21:47:25 -07:00
Peter Bell	f6dfac6974	Migrate THCCachingHostAllocator to ATen (#65746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65746 This also removes the cudaHostAllocator field on THCState, since there doesn't seem to be an API anywhere for customizing it. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31236630 Pulled By: ngimel fbshipit-source-id: 2a8e756222ae70565e77f8e7139d60ec5be32276	2021-09-30 21:26:38 -07:00
BowenBao	d39790340d	[ONNX] Enable export of __xor_ (#64042 ) (#64581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64581 * Enbale xor * Update test_pytorch_onnx_onnxruntime.py * Update symbolic_opset9.py * Update symbolic_opset9.py * Update test_pytorch_onnx_onnxruntime.py * Update symbolic_opset9.py Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919598 Pulled By: malfet fbshipit-source-id: 044e55d0697da0050f26a6ceccd1517493d7e8a6	2021-09-30 21:09:01 -07:00
BowenBao	e598ba2ef3	[ONNX] Fix inplace fill_ dtype export mismatch (#64233 ) (#64580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64580 Append `type_as` after convert `fill_` to `full_like` without dtype argument. BowenBao <bowbao@microsoft.com> Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919599 Pulled By: malfet fbshipit-source-id: f174977ced8f2c991b0615b65ff7c23fecf301c2	2021-09-30 21:08:59 -07:00
BowenBao	89cbe6229d	[ONNX] Update doc and error message for indexing export (#64290 ) (#64579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64579 Added suggested workarounds into indexing section of onnx export documentation. Update indexing export warning message with link to documentation. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919603 Pulled By: malfet fbshipit-source-id: 7fe65cb5aa7de4f7d93ff05011ba22f5adb27811 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:56 -07:00
BowenBao	d4ff344fae	[ONNX] Fix remainder export (#64230 ) (#64578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64578 * Fix remainder export for edge case when input is negative. New export relies on true_divide export. * Simplified true_divide export. Cleaned up redundant code which is handled by scalar type analysis pass. Removed dependency on `onnx::Where`, thus supports opset 7 & 8. Fixes #60179 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919601 Pulled By: malfet fbshipit-source-id: 0f78621c0ac3bdb6bf4225e049ba5f470dc8ab12 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:54 -07:00
BowenBao	0f0ef4fe64	Add onnx test for batched_nms (#53175 ) (#64381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64381 * Added new ONNX test for batched_nms * Update test according to PR in torchvision * Update test/onnx/test_pytorch_onnx_onnxruntime.py Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919602 Pulled By: malfet fbshipit-source-id: edfb5b9f75077429f7f242fd6ac06d962968dfba Co-authored-by: Bowen Bao <imbowenbao@outlook.com>	2021-09-30 21:08:52 -07:00
BowenBao	7e15f2ddaa	[ONNX] Fix gather squeeze axis in constant folding (#63588 ) (#64379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64379 * Fix gather squeeze axis in constant folding * mypy * fix indent * address comments Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919604 Pulled By: malfet fbshipit-source-id: 90edb054491433a0da2fe82324ac7c12f1ef062b	2021-09-30 21:08:50 -07:00
BowenBao	41bdfe3919	[ONNX] Fix cuda test case (#63597 ) (#64378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64378 * skip script test for unsupported autocast. * Fix test case by adding missed `autocast` and `model.cuda()`. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919600 Pulled By: malfet fbshipit-source-id: 3231fc672d97de487d6e4460626df0ba25f212ce Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:48 -07:00
BowenBao	2d61009f4a	[ONNX] Fix input sequence for pad op (#60554 ) (#64377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64377 * Fix for input primitive sequence * Test mypy * Fix for tracing tuples * Fix for extra inputs * flake8 * Rebase * Fix for tracing tuples Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919606 Pulled By: malfet fbshipit-source-id: a718c4a12cda77b968cb636acd7aa63d7b5ba326	2021-09-30 21:08:45 -07:00
BowenBao	f17ee368b3	Fix empty size constant creation (#63607 ) (#64376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64376 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919608 Pulled By: malfet fbshipit-source-id: 0e789e8470ce0f130148df764ce77f6d4fd0a274	2021-09-30 21:08:43 -07:00
BowenBao	84190dafa8	[ONNX] Update instance_norm implementation and support training (#60538 ) (#64375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64375 * Update the instance_norm track_running_stats=True implementation and support the training mode * Reference: `9baf75c86e/aten/src/ATen/native/Normalization.cpp (L532)` * Fix https://github.com/pytorch/pytorch/issues/53887 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919605 Pulled By: malfet fbshipit-source-id: 306eb2a1122bb5d90dcb7c18260a3a2057a21c34 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-30 21:07:26 -07:00
Jerry Zhang	3d6d4f4322	[fx2trt][quant] Add lowering support for per channel quantization in fx2trt (#64787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64787 This PR added support for lowering per channel quantization and dequantization operators in fx2trt, this also extends TensorMeta with extra arguments corresponding to per channel quantized Tensors, initially I was thinking of adding a qpram that can capture everything, but currently we still have some lowering support for fbgemm ops (which has scale and zero_point in operator interface). I think we can move everything to qprams after we deprecate lowering support for fbgemm ops in the future. Test Plan: Test for per channel weight: ``` python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py ``` change BC compatibility test expect for TensorMeta ``` python test/test_fx.py TestFXAPIBackwardCompatibility.test_class_member_back_compat --accept ``` Imported from OSS Reviewed By: jfix71, mrshenli, 842974287 Differential Revision: D30879848 fbshipit-source-id: 76c3804bb1d9343183ae53d9f02c1a3bf6c79e1c	2021-09-30 18:54:14 -07:00
Nikita Shulga	207fefc988	Delete rouge cu102 windows builds (#65961 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65961 Reviewed By: seemethere Differential Revision: D31325279 Pulled By: malfet fbshipit-source-id: b8748c0040cdcfb8182eb7c59a3770b7d0681de9	2021-09-30 18:44:02 -07:00
Yukio Siraichi	b3da2afebe	Clarified difference in behavior of `empty_strided` and `as_strided` (#64568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64568 Fix: #64389 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31299999 Pulled By: mruberry fbshipit-source-id: dd538ffa7cc1267ab6472806f4216b170dd0faad	2021-09-30 17:27:59 -07:00
Michael Suo	22f36353dc	Revert D31137652: [pytorch][PR] Skip failing tests when LAPACK and MAGMA are not available Test Plan: revert-hammer Differential Revision: D31137652 (`dd354117ef`) Original commit changeset: c969f75d7cf1 fbshipit-source-id: bc4cde4eeb5d38ac940ebb471abbd8b9009b3aee	2021-09-30 16:08:57 -07:00
Peter Bell	6285348f06	Implement n-dimensional hermitian FFTs (#63890 ) Summary: Closes https://github.com/pytorch/pytorch/issues/59127 cc mruberry peterbell10 walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63890 Reviewed By: ngimel Differential Revision: D30761909 Pulled By: mruberry fbshipit-source-id: 06e1e4dc65726f35c99a74f18b9fa36eb7d694a5	2021-09-30 16:02:28 -07:00
Nelson Elhage	70f9f58a71	Add __module__ to torch.dtype.__dict__ (#65182 ) Summary: torch.dtype.__reduce__ returns a string, which causes Pickle to look up the object by module and name. In order to find the right module, Pickle looks for __module__ on the object; if it doesn't find that, it falls back to searching sys.modules. Previously, torch.dtype instances did not have a `__module__` attribute, so pickling dtypes would fall back to a search of sys.module. Instances of normal Python objects have a `__module__` attribute because normal Python classes have a `__module__` key in their `__dict__`. Imitate that by populating one in `torch.dtype`. We set the field in `tp_dict` before calling `PyType_Ready` (instead of afterwards) because of the doc warning against mutating a type's dictionary once initialized: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_dict fixes https://github.com/pytorch/pytorch/issues/65077 --- I didn't add any tests because I didn't see any obvious places with similar tests for pickling or dtype objects. Let me know if I missed the right place, or should start one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65182 Reviewed By: mrshenli Differential Revision: D31310530 Pulled By: ezyang fbshipit-source-id: 20cd713ce175a709d6ce47459c3891162ce29d77	2021-09-30 14:58:11 -07:00
Scott Wolchok	38c77539e8	[PyTorch][Edge] Fix inefficiency in objLoaderMobile (#65710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65710 No need to incur extra refcount bumps, and no need to use a stringstream for what are presumably string keys anyway. ghstack-source-id: 139325445 Test Plan: CI, reviewers to confirm the keys are supposed to be strings Reviewed By: dhruvbird Differential Revision: D31215347 fbshipit-source-id: 82be93cb2e57aefe94edf74d149115cb734112be	2021-09-30 14:53:40 -07:00
Raghavan Raman	8f3983254b	[MicroBench] Added a micro benchmark for prefix sum (#65790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65790 Here are the results of the benchmark: * ATen - version that calls `at::cumsum` * NNC - a simple prefix-sum loop implemented in NNC (not vectorized) * Local - a C++ implementation of the simple prefix-sum loop * LocalAVX2 - a vectorized C++ implementation of prefix-sum, only using AVX2 * LocalAVX512 - a vectorized C++ implementation of prefix-sum, using AVX512. The vectorized implementations are from the paper "Parallel Prefix Sum with SIMD" in ADMS' 20. ``` $ OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench Run on (36 X 1601 MHz CPU s) 2021-09-28 23:13:12 ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ PrefixSumBench/ATen/64 1289 ns 1289 ns 543199 GB/s=397.069M/s PrefixSumBench/ATen/256 1867 ns 1867 ns 374232 GB/s=1096.8M/s PrefixSumBench/ATen/1024 4169 ns 4169 ns 167889 GB/s=1.9649G/s PrefixSumBench/ATen/4096 14137 ns 14136 ns 49266 GB/s=2.31806G/s PrefixSumBench/ATen/16384 49887 ns 49883 ns 13988 GB/s=2.6276G/s PrefixSumBench/ATen/65536 193742 ns 193686 ns 3628 GB/s=2.7069G/s PrefixSumBench/ATen/262144 764803 ns 764774 ns 917 GB/s=2.74219G/s PrefixSumBench/ATen/1048576 3040653 ns 3040277 ns 231 GB/s=2.75916G/s PrefixSumBench/Local/64 586 ns 586 ns 1197003 GB/s=873.244M/s PrefixSumBench/Local/256 1077 ns 1077 ns 646265 GB/s=1.90143G/s PrefixSumBench/Local/1024 3050 ns 3050 ns 229458 GB/s=2.68579G/s PrefixSumBench/Local/4096 11910 ns 11910 ns 58953 GB/s=2.75132G/s PrefixSumBench/Local/16384 43204 ns 43202 ns 16081 GB/s=3.03393G/s PrefixSumBench/Local/65536 167966 ns 167966 ns 4154 GB/s=3.12139G/s PrefixSumBench/Local/262144 667631 ns 667613 ns 1048 GB/s=3.14127G/s PrefixSumBench/Local/1048576 2654785 ns 2654631 ns 264 GB/s=3.15999G/s PrefixSumBench/NNC/64 642 ns 642 ns 1095277 GB/s=797.442M/s PrefixSumBench/NNC/256 1139 ns 1138 ns 617214 GB/s=1.799G/s PrefixSumBench/NNC/1024 3103 ns 3103 ns 225531 GB/s=2.63979G/s PrefixSumBench/NNC/4096 12053 ns 12052 ns 58084 GB/s=2.71883G/s PrefixSumBench/NNC/16384 43227 ns 43225 ns 16192 GB/s=3.03231G/s PrefixSumBench/NNC/65536 168065 ns 168056 ns 4153 GB/s=3.11972G/s PrefixSumBench/NNC/262144 668974 ns 668921 ns 1045 GB/s=3.13513G/s PrefixSumBench/NNC/1048576 2657464 ns 2657341 ns 263 GB/s=3.15677G/s PrefixSumBench/LocalAVX2/64 523 ns 523 ns 1351308 GB/s=979.537M/s PrefixSumBench/LocalAVX2/256 755 ns 755 ns 927762 GB/s=2.71159G/s PrefixSumBench/LocalAVX2/1024 1759 ns 1759 ns 400355 GB/s=4.65609G/s PrefixSumBench/LocalAVX2/4096 6708 ns 6706 ns 103959 GB/s=4.88649G/s PrefixSumBench/LocalAVX2/16384 22143 ns 22142 ns 31229 GB/s=5.91951G/s PrefixSumBench/LocalAVX2/65536 83649 ns 83642 ns 8350 GB/s=6.26828G/s PrefixSumBench/LocalAVX2/262144 330433 ns 330427 ns 2133 GB/s=6.34679G/s PrefixSumBench/LocalAVX2/1048576 1302301 ns 1302179 ns 537 GB/s=6.44198G/s PrefixSumBench/LocalAVX512/64 474 ns 474 ns 1459151 GB/s=1080.8M/s PrefixSumBench/LocalAVX512/256 576 ns 576 ns 1217442 GB/s=3.55524G/s PrefixSumBench/LocalAVX512/1024 994 ns 994 ns 703387 GB/s=8.24434G/s PrefixSumBench/LocalAVX512/4096 3642 ns 3641 ns 190646 GB/s=8.99857G/s PrefixSumBench/LocalAVX512/16384 10140 ns 10140 ns 68947 GB/s=12.9267G/s PrefixSumBench/LocalAVX512/65536 35739 ns 35736 ns 19567 GB/s=14.6711G/s PrefixSumBench/LocalAVX512/262144 156415 ns 156413 ns 4467 GB/s=13.4078G/s PrefixSumBench/LocalAVX512/1048576 613952 ns 613876 ns 1144 GB/s=13.665G/s ``` Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31253849 Pulled By: navahgar fbshipit-source-id: f33e7be787c86a09e90babddd66b16e2e0777eb4	2021-09-30 14:44:52 -07:00
Michael Suo	24f59fa20b	[ci] fix softmax bc check (#65952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65952 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31320441 Pulled By: suo fbshipit-source-id: ddd2ccca523d7ed31b231d924fbd6206525f16cf	2021-09-30 14:40:43 -07:00
Kefei Lu	d4d3bb91f9	Refactor `OperatorSupport` related code and fix TRT not supporting int64 dtype (#65848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65848 This diff includes: * [fix]: The initialization of `OperatorSupport._support_dict` makes it a class variable, so we need to move its initialization into constructor. * Add abstract class (more of an interface) `OperatorSupportBase`, since `OperatorSupport`'s purpose is too specific. * [refactor]: what `TRToperatorSupport` really does is to populate a `OperatorSupport._support_dict`, so there really is no reason for subclassing. So removing it, and changing it to instantiating a `OperatorSupport` with properly populated `_support_dict`. * Add a framework for defining simple and basic op support logic, and composing them into more complex ones: 1. `create_op_support` wraps a function into a `OperatorSupportBase` instance 2. `chain` can combine several simple `OperatorSupportBase` into more complex ones 3. `OpSupports` provides a set of pre-defined, simple `OperatorSupportBase` that can be composed together using `chain`. 1. Currently the only pre-defined one is `decline_if_input_dtype(..)`, which declares a node non-supported, if its args are of user specified dtype * Fix `TRTOperatorSupport` so that it not only looks for registered converters, but also decline a node if its arg is of int64 Test Plan: linter and CI Reviewed By: 842974287 Differential Revision: D31275525 fbshipit-source-id: bbc02f7ccf4902a7912bb98ba5be2c2fbd53b606	2021-09-30 13:36:26 -07:00
Michael Suo	9ae63bd87c	Revert D31238123: [pytorch][PR] Avoid saving self for`softmax` and `log_softmax` Test Plan: revert-hammer Differential Revision: D31238123 (`fb412bdd80`) Original commit changeset: afd319d3676d fbshipit-source-id: b7980d653a4b8322a225f1dd08c2857ecbe5bc94	2021-09-30 11:34:14 -07:00
Ivan Yashchuk	541eb1db63	Add cuSPARSE descriptors and update CSR addmm (#60838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60838 Rewrote `addmm_out_sparse_csr_dense_cuda` implementation using new cusparse descriptors. `addmm` now works without conversions with both 32-bit and 64-bit indices. The dense tensors can have a row- or column-major layout. If the dense tensors are a contiguous slice of a larger tensor, the storage is used directly without temporary copies. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30643191 Pulled By: cpuhrsch fbshipit-source-id: 5555f5b59b288daa3a3987d322a93dada63b46c8	2021-09-30 11:32:51 -07:00
Jithun Nair	be00f0207a	Update git version for CentOS base dockers (#65703 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65048 cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65703 Reviewed By: albanD Differential Revision: D31245666 Pulled By: janeyx99 fbshipit-source-id: 5431876bf19435eb3fd90a53a3ec94fd66c9210e	2021-09-30 11:26:21 -07:00
Michael Suo	8297a16cc0	[ci] try installing libgnutls to fix cert error (#65934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934 see: https://github.com/pytorch/pytorch/issues/65931, this was a suggested remediation on the linked issue Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D31313040 Pulled By: suo fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738	2021-09-30 11:23:17 -07:00
Nikita Shulga	6a30d83596	Move ASAN to GHA (#65846 ) Summary: - Introduce `ciflow/sanitizers` label - Modify asan pattern in `.jenkins/pytorch/build.sh` - Produce wheel in `.jenkins/pytorch/build-asan.sh` - Increase stack size hard limit to 82Mb in test docker containers Pull Request resolved: https://github.com/pytorch/pytorch/pull/65846 Reviewed By: seemethere Differential Revision: D31282654 Pulled By: malfet fbshipit-source-id: f73e692899cc9bbe106ececc26f1fe430dfeae9d	2021-09-30 09:49:52 -07:00
Eli Uriegas	cdbfb2b689	.github: Bump linux and windows gpu max available (#65923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65923 Still noticing that queues are long particularly for windows GPU machines, bumping this to compensate Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31308728 Pulled By: seemethere fbshipit-source-id: b68c3a76335960def23e1f425ba5b0a219f07e73	2021-09-30 09:38:02 -07:00
Elias Ellison	928a4bbafb	[JIT] Fix compilation unit reference link in constant object upon load (#65784 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784 Reviewed By: suo Differential Revision: D31251033 Pulled By: eellison fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a	2021-09-30 09:32:28 -07:00
Kevin Tse	8130157504	[DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877 Fixes #65808 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31296041 Pulled By: NivekT fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4	2021-09-30 08:46:32 -07:00
Martin Yuan	7f87ff183d	[RFC] [Modular] Include less headers in vararg_functions.cpp (#65672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65672 `ATen/ATen.h` has a list of all headers but vararg_functions.cpp only uses two of them. Change to include less for min_runtime. ghstack-source-id: 139389772 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31198293 fbshipit-source-id: 9794a2696a1b124be7fced2836c633ae899aa5c8	2021-09-30 08:35:28 -07:00
albanD	ea776fa034	Update CODEOWNERS for optim (#65773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65773 Reviewed By: mrshenli Differential Revision: D31269749 Pulled By: albanD fbshipit-source-id: 1ec35d2396797b8e97a7122e2b3a9021f8fcf0a0	2021-09-30 08:30:42 -07:00
Erjia Guan	b777d790ea	Convert Sampler back to lazily construction (#63646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646 Fixes #63609 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D30451774 Pulled By: ejguan fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413	2021-09-30 07:32:06 -07:00
Supriya Rao	4666e3f192	[quant] update fused_obs_fake_quant op to accept output_fake_quant argument (#65621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65621 Add a new attribute to the FusedMovingAvgObsFakeQuantize that controls if the Fake Quant operation should be applied at the output of a particular layer. The motivation is to give the users additional control to control the numerics of the fake_quant operators during training. It defaults to always fake quant the output (True). Note: We will still observer the tensors as before (only the fake_quant operation is controlled using this flag) For example ``` input model x -> fc1 -> fc2 -> non_quantizable_op -> fc3 After fake_quant x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> fake_quant(fc2) -> non_quantizable_op -> fake_quant() -> fc3 -> fake_quantize(fc3) With output_fake_quant disabled at the output of fc2 and fc3 (since their outputs are non-quantizable) x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> non_quantizable_op -> fake_quant() -> fc3 ``` Test Plan: ./buck-out/gen/caffe2/test/quantization_fx\#binary.par -r test_disable_output_fake_quant Reviewed By: jerryzh168 Differential Revision: D31174526 fbshipit-source-id: bffe776216d041fb09133a6fb09bfc2c0bb46b89	2021-09-30 01:08:01 -07:00
Charles David Hernandez	6d4b93bd96	[quant] adding memoryless observers for embeddingbag QAT work (#65699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699 related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425 The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters. This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory) In addition to the above, I altered the reset_min_max_vals function for MinMaxObserver so that it would preserve the device of the existing self.min_val and self.max_val which was not preserved previously compared to how it is initialized (using factory_kwargs) Test Plan: python test/test_quantization.py TestObserver (added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver) Imported from OSS Reviewed By: supriyar Differential Revision: D31209773 fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a	2021-09-30 00:55:32 -07:00
Michael Suo	de80aff72d	Revert D31132861: Make JIT Aliasing Test Less Brittle Test Plan: revert-hammer Differential Revision: D31132861 (`9f97c66a7a`) Original commit changeset: 26fc2e6bc77b fbshipit-source-id: 46be9168179d555be6b6a92b54b2bb84b3f834ed	2021-09-29 23:39:40 -07:00
Don Jang	4176afc4a0	[Static Runtime] Disable SigridTransform + ListUnpack fusion when outputs reachable from graph output (#62697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62697 Reviewed By: hlu1 Differential Revision: D29979402 fbshipit-source-id: 913e8396a0530ce3617211112a2b1147ef2e9df9	2021-09-29 22:47:48 -07:00
Kevin Tse	edab202a30	[DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31272794 Pulled By: NivekT fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02	2021-09-29 22:42:22 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Ivan Yashchuk	dd354117ef	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: H-Huang Differential Revision: D31137652 Pulled By: mruberry fbshipit-source-id: c969f75d7cf185765211004a0878e7c8a5d3cbf7	2021-09-29 21:31:14 -07:00
Yi Wang	2c29ec2a41	Remove "SciPioneer" from PT Distributed code owners (#65862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65862 ghstack-source-id: 139378782 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D31291340 fbshipit-source-id: 65d6a82c57dd50d8a4241e9442d73002590989d9	2021-09-29 20:52:01 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Natalia Gimelshein	5349ea921b	Migrate THCIntegerDivider.cuh to ATen (#65745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65745 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257937 fbshipit-source-id: 283693525859b7a77a116df0c227653763911a42	2021-09-29 20:37:41 -07:00
Kiuk Chung	3900509b7d	(torchelastic) make --max_restarts explicit in the quickstart and runner docs (#65838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65838 closes https://github.com/pytorch/pytorch/pull/65675 The default `--max_restarts` for `torch.distributed.run` was changed to `0` from `3` to make things backwards compatible with `torch.distributed.launch`. Since the default `--max_restarts` used to be greater than `0` we never documented passing `--max_restarts` explicitly in any of our example code. Test Plan: N/A doc change only Reviewed By: d4l3k Differential Revision: D31279544 fbshipit-source-id: 98b31e6a158371bc56907552c5c13958446716f9	2021-09-29 19:29:01 -07:00
Zafar Takhirov	c7ef620a14	[quant] Add imports to the torch/ao/quantization/__init__.py (#64911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64911 The import statements that involve the `quantize.py` were not added to the module level __init__ file. Those imports are necessary to mimic the behavior of the old import locations. Otherwise, the user would need to change their import statements to `from torch.ao.quantization.quantize import quantize` (instead of `from torch.ao.quantization import quantize`. Another change in this diff is that we don't use `__all__` anymore. The all dunder was never used in quantization anyway, and just creates a potential bug when using `from ... import *`. ghstack-source-id: 139342483 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30897663 fbshipit-source-id: a7b4919a191755e3ba690a79ce3362889f416689	2021-09-29 19:08:45 -07:00
soulitzer	fb412bdd80	Avoid saving self for`softmax` and `log_softmax` (#65242 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64000 - updates double backward formula to compute grad wrt output instead of self - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242 Reviewed By: albanD Differential Revision: D31238123 Pulled By: soulitzer fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4	2021-09-29 18:16:12 -07:00
Masaki Kozuki	768cfaa8f8	fix typo in _sharded_tensor (#65511 ) Summary: per title cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65511 Reviewed By: albanD Differential Revision: D31239269 Pulled By: cbalioglu fbshipit-source-id: 602c0bf7ef96a930606d68b15a5b3cadda9d9437	2021-09-29 18:00:47 -07:00
Jiewen Tan	9f97c66a7a	Make JIT Aliasing Test Less Brittle (#65493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493 Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation. Test Plan: python test/test_ops.py -k test_variant_consistency_jit_linalg_tensorinv python test/test_ops.py -k test_variant_consistency_jit_nn_functional_normalize Reviewed By: eellison Differential Revision: D31132861 Pulled By: alanwaketan fbshipit-source-id: 26fc2e6bc77be3a296967cf29a3f6ded231302fa	2021-09-29 17:11:04 -07:00
soulitzer	91611fe1d1	Decouple forward AD checks from backward AD in OpInfo tests and gradcheck (#65040 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64999 - Adds a flag to gradcheck `check_backward_ad` that can be used to disable gradcheck for backward ad - This is a bit bc-breaking in terms of positional args, but I prefer this ordering - In OpInfo tests for forward ad: - set `check_backward_ad` False - In test_ops treat `supports_autograd` as if it is `supports_backward_ad` (it basically already is) - the only modification needed is to no longer skip forward ad tests if `supports_autograd` is false - test_dtype, test_variant_consistency, etc behave correctly as-is - In a follow-up PR, we can rename it to actually be `supports_backward_ad` - Testing - https://github.com/pytorch/pytorch/pull/65060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65040 Reviewed By: albanD Differential Revision: D31238177 Pulled By: soulitzer fbshipit-source-id: f068d4cbe7ffb094930b16cddb210583b9b7b2c4	2021-09-29 17:01:34 -07:00
Nikita Shulga	5950240bdf	Stop Win+CUDA-10.2 builds (#65649 ) Summary: See https://github.com/pytorch/pytorch/issues/65612 and https://github.com/pytorch/pytorch/issues/25393 Fixes https://github.com/pytorch/pytorch/issues/65648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65649 Reviewed By: janeyx99 Differential Revision: D31189692 Pulled By: malfet fbshipit-source-id: 6ec0548d5833f3428d882071d26c357d89b0a9ba	2021-09-29 15:41:23 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Natalia Gimelshein	ad85b582da	Remove THCDeviceTensor (#65744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65744 This is just dead code. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257940 fbshipit-source-id: 6c02264106c2dcbadd332f24b95bc9351a04fd9e	2021-09-29 14:54:46 -07:00
Peter Bell	20374c991b	slow_conv2d_forward: avoid calling dispatcher in parallel region (#65724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65724 See gh-56794 Avoid dispatch inside of parallel_for by: 1. Replacing Tensor slicing with TensorAccessor 2. Copy bias into output only once, outside of the parallel region 3. Replaces `addmm`_ with a direct call to gemm. Technically this also adds a new requirement that the output always be contiguous, but the out argument version isn't exposed or used anywhere in the `torch.nn` API. So that should be fine. Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31257875 Pulled By: ngimel fbshipit-source-id: 84d2b39e7f65334bdfcc2c4719f93ee3c514ca32	2021-09-29 14:09:32 -07:00
Jack Kelly	7191dd2613	Update Module docstring for Python 3 (#65748 ) Summary: In Python 3, we can call `super()` without any arguments. If I understand correctly, Python 2 is no longer supported by PyTorch, so we can change the documentation to be Python-3 only :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65748 Reviewed By: saketh-are Differential Revision: D31246055 Pulled By: albanD fbshipit-source-id: 3980def1a556d4bdfa391ea61cb2a65efa20df79	2021-09-29 13:40:15 -07:00
Vasiliy Kuznetsov	8bf0ba546e	ns for fx: add basic testing on cuda (#65593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65593 Adds test cases that the three Numeric Suite Core APIs work when the models are on cuda. In particular: 1. create models and move them to cuda 2. add loggers (if applicable) 3. run data through (if applicable) 4. extract results It works without code changes because a `Logger` object is created without any device specific objects (they only get added if a data is passed through). It's good to have this tested. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_cuda python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_loggers_cuda python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_cuda ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D31160897 fbshipit-source-id: 8eacf164d0496baf2830491200ea721c0f32ac92	2021-09-29 13:06:30 -07:00
Natalia Gimelshein	0dd1b74a5b	Migrate THCScanUtils to ATen (#65743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65743 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257938 fbshipit-source-id: 273b22df41bb7f2a0ab605ec1f6322c2937e7472	2021-09-29 12:39:37 -07:00
Dhruv Matani	a84feeeade	[PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732 For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated. This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds. Test Plan: Build and check mobile models end to end. ``` buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: ezyang Differential Revision: D31185407 fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b	2021-09-29 12:20:33 -07:00
Eli Uriegas	7b5d676fa1	.github: Bump linux gpu max limit to 100 (#65831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65831 Was noticing scaling issues last night due to the lack of linux.8xlarge.nvidia.gpu machines, seems as though that even at max capacity we were still about ~50 queued workflows behind, this should close that gap. Also since these run the longest types of tests these are the most likely to overlap with scale messages being processed while available runners are still maxed out Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31275892 Pulled By: seemethere fbshipit-source-id: b22ceda115b70d7bdd9c4bc207b55ffab50381ef	2021-09-29 12:06:54 -07:00
Mike Iovine	c975ca4337	[Static Runtime] Simplify out variant overload implementations (#65384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65384 The following pattern appears frequently in `ops.cpp`: ``` if (!n->matches(schema_1) && !n->matches(schema_2) && ... && !n->matches(schema_n)) { LogAndDumpSchema(n); return nullptr; } return [](ProcessedNode* p_node) { if (p_node->Output(0).isNone()) { if (p_node->Input(i).isSomeType()) { // special logic for schema 1 } else if (p_node->Input(i).isSomeOtherType()) { // special logic for schema 2 } else if (...) { // special logic for schema3 } // and so on } else { // another complicated type checking chain } }; ``` A much cleaner way to implement operator overloads is like this: ``` if (n->matches(schema_1)) { return schema_1_impl; } else if (n->matches(schema_2)) { return schema_2_impl; } // and so on ``` This has a few advantages: * Significantly reduces complexity of the out variant implementations, especially for ops with more than 2 overloads. One implementation corresponds to one schema. This makes the implementation more readable/maintainable. * Adhering to this convention makes it easier to add a new overload. Just add a new `n->matches(...)` case instead of working the schema into existing complicated logic. * Ops are marginally faster since we don't have to check types at runtime. Note: there are a few cases where this actually made the code less concise (`aten::div`), so I left those ops untouched. Thanks for pointing this out in another diff d1jang Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31072328 fbshipit-source-id: c40a4f7e6a79881e94c9ec49e9008ed75cfc8688	2021-09-29 12:02:11 -07:00
Eli Uriegas	2f712c452e	.github: Remove confusing on_pull_request variable (#65731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65731 It originally had purpose but after ciflow was introduced every PR had on_pull_request set so it's not really as useful as it once was Also removes the equally as confusing only_build_on_pull_request variable as well This change should produce no functional changes in our generated workflows Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31225398 Pulled By: seemethere fbshipit-source-id: 7bd8e8175794ab7d09b0632321bf52538435e858	2021-09-29 11:56:13 -07:00
Jane Xu	6c2f235d36	common_utils.py: Add ASAN as a platform for which you can disable tests (#65791 ) Summary: Could be useful for the future. Next steps: document it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65791 Reviewed By: suo Differential Revision: D31254115 Pulled By: janeyx99 fbshipit-source-id: 715c18b4505f2be6328aa0be25976116d6956b25	2021-09-29 11:00:03 -07:00
Kefei Lu	911d01c1de	type annotate operator_support (#65136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65136 Opportunistically add type annotation for operator_support.py Test Plan: run linter, CI Reviewed By: yinghai Differential Revision: D30928464 fbshipit-source-id: 615c75152b9938792f03cdceb2a113bda6ab28c7	2021-09-29 10:38:47 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Michael Suo	9b40eaaaab	Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries Test Plan: revert-hammer Differential Revision: D31193205 (`971c57f1d0`) Original commit changeset: 5c1b554a59d0 fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87	2021-09-29 09:49:33 -07:00
Richard Barnes	2670cacfc2	LLVM-12 fix for tensor_new.cpp (#65785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65785 Fixes offset to nullptr at fbcode/caffe2/torch/csrc/utils/tensor_new.cpp:206 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31250995 fbshipit-source-id: 56c7761787e732180a2537a8aa4346a39e7399a8	2021-09-29 09:35:18 -07:00
Natalia Gimelshein	09eb3e661c	don't check 0 elements for cat symbolic diff (#65751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65751 Fixes symbolic script grad formula for cat to correctly handle empty tensors Test Plan: Existing tests Reviewed By: eellison Differential Revision: D31208364 fbshipit-source-id: d676d9abcc033b56076fa946f58f3db50034502d	2021-09-29 09:34:03 -07:00
Peter Bell	1d681c1ab2	Migrate THCThrustAllocator to ATen (#65492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65492 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148180 Pulled By: ngimel fbshipit-source-id: d5e4902036493517ca97c3442713b5e0e79229f9	2021-09-29 09:27:41 -07:00
Peter Bell	971c57f1d0	CMake: Limit python include directories to only python libraries (#65654 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654 Reviewed By: gchanan Differential Revision: D31193205 Pulled By: malfet fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208	2021-09-29 08:09:08 -07:00
Mike Iovine	5f7ab7be6f	[Static Runtime] concat_add_mul_replacenan_clip retains axis arg (#65741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65741 This op previously assumed `axis == 1`, causing graphs that would otherwise be valid to return incorrect results after fusing. Reviewed By: hlu1 Differential Revision: D31234944 fbshipit-source-id: 89885a3b119357698ebd9fd429b009813260a2f4	2021-09-29 08:04:20 -07:00
Dhruv Matani	f63150fd1d	[PyTorch Edge] Reduce the cost of computing isIncludedInAlias() (#65735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65735 Currently, `isIncludedInAlias()` calls `getRuntimeDispatchKeySet()` which creates a new `DispatchKeySet` object from an enumerated list of dispatch keys. `isIncludedInAlias()` then checks if a single dispatch key is part of this set. Instead, just pass in the key one wishes to check. This is marginally faster. ghstack-source-id: 139281528 Test Plan: See these 2 AI Bench Runs on the Milan-FFF-11-30 device. ### Before [AI Bench](https://www.internalfb.com/intern/aibench/details/237302972704466), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632804218329.html) ### After [AI Bench](https://www.internalfb.com/intern/aibench/details/606320012968375), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632807348803.html) Check the the flamegraphs, and focus on any kernel registration code path during library initialization. Reviewed By: swolchok Differential Revision: D31228062 fbshipit-source-id: 7a986e3593c30ded7919cd3b564ec579dc97ab5f	2021-09-29 07:40:39 -07:00
Philip Meier	aebde1bc2b	deprecate device getter from `torch.testing` namespace (#63844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31141433 Pulled By: mruberry fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732	2021-09-29 02:40:52 -07:00
Philip Meier	07d5d7b5cc	move kernel launch checks from `torch.testing` to `torch.testing._internal.check_kernel_launches` (#60862 ) Summary: The fact that these functions are only used in a single test might be a good enough reason to move them to that module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60862 Reviewed By: H-Huang Differential Revision: D31141354 Pulled By: mruberry fbshipit-source-id: 6ce1f721b88620c5f46222ad1b942bc689f0a3e0	2021-09-29 00:39:22 -07:00
Mike Ruberry	0a0564a347	Revert D31206837: [pytorch][PR] `*_solve` methods: implements forward AD Test Plan: revert-hammer Differential Revision: D31206837 (`26e31f76b0`) Original commit changeset: 040beda97442 fbshipit-source-id: f28091327357af9f54f367eda6606240924b93ac	2021-09-28 23:31:16 -07:00
Philip Meier	f9c2dc860d	make layout check optional in torch.testing.assert_close() (#65419 ) Summary: In case the inputs have a different layout, `assert_close(..., check_layout=False)` converts them to strided before comparison. This is helpful if you just want to compare the values of sparse COO / CSR tensor against a strided reference. This keeps BC, since the default `check_layout=True` was the old, hard-coded behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65419 Reviewed By: H-Huang Differential Revision: D31133629 Pulled By: mruberry fbshipit-source-id: ca8918af81fb0e0ba263104836a4c2eeacdfc7e6	2021-09-28 23:23:41 -07:00
Richard Barnes	8a247fb418	LLVM-12 fix for shm_mutex (#65781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65781 Fixes ``` stderr: In file included from caffe2/caffe2/contrib/shm_mutex/shm_mutex.cc:1: caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:28: error: anonymous non-C-compatible type given name for linkage purposes by alias declaration; add a tag name here [-Werror,-Wnon-c-typedef-for-linkage] using TicketStruct = struct : ShmBaseHeader { ^ TicketStruct caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:31: note: type is not C-compatible due to this base class using TicketStruct = struct : ShmBaseHeader { ^~~~~~~~~~~~~ caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:7: note: type is given name 'TicketStruct' for linkage purposes by this alias declaration using TicketStruct = struct : ShmBaseHeader { ^ 1 error generated. Cannot execute a rule out of process. On RE worker. Thread: Thread[main,5,main] Command failed with exit code 1. ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31248938 fbshipit-source-id: 47342fecc72ada9397a1b7bd6fcabfccf988dd3e	2021-09-28 22:51:38 -07:00
Nikita Shulga	4a7a0ea42e	Skip flaky ASAN tests (#65792 ) Summary: See https://github.com/pytorch/pytorch/issues/65727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65792 Reviewed By: janeyx99 Differential Revision: D31254490 Pulled By: malfet fbshipit-source-id: 76714db30a5566fbab95179236ccdafab22cf551	2021-09-28 22:33:02 -07:00
Eli Uriegas	d528c7f3c0	.github: Move windows back to default directory (#64962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64962 Moves windows builds / tests back to the default directory. Previously we had moved them because checkout would sometimes fail due to file handlers still being open on the working directory. Moving back to the default directory also has the added bonus of sccache working again so here's to hoping that this doesn't have any adverse affects Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc peterjc123 mszhanyi skyline75489 nbcsm ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31250072 Pulled By: seemethere fbshipit-source-id: a803bf0e00e1b2b0d63f78600588281622ee0652	2021-09-28 19:41:35 -07:00
peterjc123	ed4491be6f	Fix error code checking for Windows build scripts (#57331 ) Summary: The variable `%errorlevel%` is evaluated before the whole line of command starts, so it is useless when used in a if-block. Also, let's prevent using `%errorlevel%` because it may be set by the users accidentally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57331 Reviewed By: anjali411 Differential Revision: D28140182 Pulled By: malfet fbshipit-source-id: a3f21d65623bb25f039805c175e9f3b468bcb548	2021-09-28 19:27:07 -07:00
Shunting Zhang	0d7036fdaf	don't leak build time path name to runtime for frozen python modules (#65715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65715 Here is how we freeze a python module: - we call python builtin compile method with the source code of the modules and the path. This method returns a python code object - we call marshal.dumps to serialize the code object to bytes. The code_object.co_filename actually matches the one passed in to the compile method. We can simply replace that with a marker to avoid leak build time path to runtime. This works on nested code objects as well: ``` #!/bin/env python3.8 import marshal code_str = """ print("hello") class MyCls: def __init__(self): pass """ co = compile(code_str, "<Generated by torch::deploy>", "exec") cobytes = marshal.dumps(co) import pdb; pdb.set_trace() ``` Checking `co`: ``` (Pdb) co.co_filename '<Generated by torch::deploy>' (Pdb) co.co_consts ('hello', <code object MyCls at 0x7f0e8670bbe0, file "<Generated by torch::deploy>", line 4>, 'MyCls', None) (Pdb) co.co_consts[1].co_filename '<Generated by torch::deploy>' ``` Test Plan: Find the serialized frozenmodule for torch.nn.modules.linear module in the generated bytecode_x.c file. Put the content to /tmp/linear.bytecode Run the testing script: ``` import marshal co_bytes = bytes(eval("[{}]".format("".join(open('/tmp/linear.bytecode').readlines()).replace('\n', '').replace('\t', '')))) co = marshal.loads(co_bytes) print(co) ``` The output for the paste without the change: ``` <code object <module> at 0x7f39ca7f07c0, file "/data/users/shunting/fbsource/fbcode/buck-out/opt/gen/caffe2/gen_frozen_torchpython_src__srcs/torch/nn/modules/linear.py", line 1> ``` The output for the paste with the change: ``` <code object <module> at 0x7f05a765d710, file "<Generated by torch::deploy>", line 1> ```` Note that the file part is changed as expected. Reviewed By: suo Differential Revision: D31214555 fbshipit-source-id: 56958e0a7352f8c30a3377f83209efe7db61f0fb	2021-09-28 19:25:51 -07:00
Nikita Shulga	72b27bde83	[CIFlow] Modify workflow trigger logic (#65733 ) Summary: CIFLow workflows should always run on push event On pull-request workflow should run if label conditions are met or if no `ciflow/` labels are associated with it, workflow is enabled by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/65733 Reviewed By: zhouzhuojie Differential Revision: D31251278 Pulled By: malfet fbshipit-source-id: 31ce745cb224df7c6fec1682ec4180513e3dadf3	2021-09-28 19:19:49 -07:00
Eli Uriegas	b3c32ad32f	.github: Move calculate-docker-image into build (#65789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65789 These common types of jobs can be moved into build since it's typically a no-op, could be annoying in the future to debug docker builds but dedicating an entire ephemeral node to a noop seems like a waste to me Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D31253017 Pulled By: seemethere fbshipit-source-id: c7b5ea35a57fb1576122df219d387c86e420fd1f	2021-09-28 19:15:24 -07:00
Zafar	609384c056	[sparsity][doc] Docstring for WeightNormSparsifier (#65294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65294 This adds the docstring documentation to the WeightNormSparsifier and adds the typehints for the constructor args. Note, this does not require testing as only the doc is changed. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186827 Pulled By: z-a-f fbshipit-source-id: c5010c9bba25b074c4cc6c88f251474b758f950d	2021-09-28 14:14:51 -07:00
Zafar	92ee5cc2e2	[sparsity] Fix for accumulation bug in WeightNormSparsifier (#65293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65293 This fixes a bug in the WeightNormSparsifier, where the mask is being multiplied by the newly computed mask. Because the mask elements are binary 0/1, this accumulates the mask over every iteration, eventually collapsing the mask to zero. This bug accidentally bled through from old versions. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186829 Pulled By: z-a-f fbshipit-source-id: 3f5b2c833148ab0bd8084e7410ce398f1252e65e	2021-09-28 14:14:49 -07:00
Zafar	a90912ecc5	[sparsity] Remove the pack_param from the sparsifier state_dict (#65292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65292 That was the original design, that we decided to simplify by removing the packing in the sparsifier. The state of the sparsifier is saved directly, and the old behavior accidentally bled through to the current version. This change removes the `_pack_params` method, and changes the state_dict to include the state directly. We don't have to change the load_state_dict, as it will work with either the old or the new format. The main reason for this PR is the simplification. The original design didn't achieve anything useful by packing the sparsification parameters. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186826 Pulled By: z-a-f fbshipit-source-id: 4ad72a7e669f048d2f2d269269ee11b63fa169db	2021-09-28 14:12:52 -07:00
Yukio Siraichi	c829cb6840	Port `min` kernel to structured kernels. (#61450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741713 Pulled By: bdhirsh fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3	2021-09-28 14:03:54 -07:00
Yukio Siraichi	c2252b3aa6	Port `max` kernel to structured kernels. (#61449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61449 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741714 Pulled By: bdhirsh fbshipit-source-id: 6c8c17d20f578ab0af8a969d103a19ccd8d51842	2021-09-28 14:02:26 -07:00
Yukio Siraichi	51f1569c77	Add checks for structured in-place operations. (#65686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65686 Fixes: #57827 This PR introduces `check_inplace` function. It contains some common checks for all structured in-place operators (e.g. dtype, device, and sizes). `set_output` method calls `check_inplace` on in-place specializations of structured kernels. Besides that, it also: - adds overlap assertions for both in-place and out-of-place overloads - remove in-place operator specific `TORCH_CHECK` around the code base Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31234063 Pulled By: ezyang fbshipit-source-id: fa3b45775af7812e07a282e7cae00b68caf0fdb0	2021-09-28 13:21:26 -07:00
Yukio Siraichi	93852bb2d4	Port `sort` kernel to structured kernels. (#62391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62391 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D30903992 Pulled By: bdhirsh fbshipit-source-id: 52687aa2483c101056825433d39d69c60b829c62	2021-09-28 13:12:35 -07:00
Zafar Takhirov	57529d48c4	[quant] Fix applying non-zero offset 1 to null pointer in quantized interpolation (#65570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65570 Although this is not an issue that could pop-up in practice, LLVM-12 throws an error about this issue if not checked. Test Plan: `buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_empty_batch (quantization.core.test_quantized_op.TestQuantizedOps)'` Reviewed By: r-barnes Differential Revision: D31151681 fbshipit-source-id: e039c6aa1687a61ef6774f045744dc9d768d5c80	2021-09-28 12:28:59 -07:00
Kushashwa Ravi Shrimali	4752453d27	[Structured Kernels] Port for `baddbmm` and `bmm` (#64805 ) Summary: This PR attempts to port `baddbmm` and `bmm` to structured kernels. The reason it's in the same PR: because a lot of it is common for both the ops, including the checks and implementation. Issue tracker: https://github.com/pytorch/pytorch/issues/55070 cc: ysiraichi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/64805 Reviewed By: gchanan Differential Revision: D31134454 Pulled By: ezyang fbshipit-source-id: 3294619834a8cc6a0407aea660c556d3a42b6261	2021-09-28 11:07:31 -07:00
Eli Uriegas	278edb5626	.circleci: Only generate docker configs we need (#65728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65728 Changes the docker image generation script to only include image build jobs for images that we actually use within CircleCI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31224674 Pulled By: seemethere fbshipit-source-id: 64b14e1a4ef82d345ec7b898c4c89d9a9419e4de	2021-09-28 10:38:13 -07:00
Nikita Shulga	145202c45b	Define timeout in TestIndividualWorkerQueue (#65742 ) Summary: This test occasionally deadlocks while waiting for the child process to report result. But as the test is small, entire test should never take more than 1-2 sec, but to be on the safe side set timeout to 5 sec Somewhat mitigates https://github.com/pytorch/pytorch/issues/65727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65742 Reviewed By: janeyx99, ejguan Differential Revision: D31235116 Pulled By: malfet fbshipit-source-id: 0cdd2f7295f6f9fcefee954a14352e18fae20696	2021-09-28 10:01:19 -07:00
Jane Xu	50edc2679d	onnx/test.sh: Run test/onnx in only shard 1 (#65722 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65722 Reviewed By: albanD Differential Revision: D31223236 Pulled By: janeyx99 fbshipit-source-id: 3b648cb940a95866f465b27b8bdc74b06d258140	2021-09-28 08:45:25 -07:00
Yuichi Taguchi	87cd658c27	Add override to virtual destructor in derived class (#65476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65476 As suggested by `-Winconsistent-missing-destructor-override`. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31115128 fbshipit-source-id: a4e2441c13704c0c46e3e86f7886fca76c40ca39	2021-09-28 08:37:23 -07:00
Stephen Jia	57e5ae5306	[vulkan] Use push constants instead of SSBOs (#65716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65716 Currently, we send arguments to shaders by creating and filling a SSBO (Shader Storage Buffer Object). However, we can instead use [push constants](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdPushConstants.html) to send a small amount of uniform data to shaders. Push constants are slightly more performant than using a SSBO and also have the added benefit of not needing to allocate and manage memory for a buffer object since they update the pipeline data directly. The downside of using push constants is that there is a maximum size for a push constant block, described by `maxPushConstantsSize` in [VkPhysicalDeviceLimits](https://www.khronos.org/registry/vulkan/specs/1.1/html/vkspec.html#VkPhysicalDeviceLimits). The minimum size guaranteed by the spec is 128 bytes, which is enough for 32 `float`/`int` variables, or 8 `vec4` variables. This should be enough for our purposes. Currently, the Convolution shaders use the largest uniform block which only uses 22 bytes. Test Plan: Run `vulkan_api_test`: ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: beback4u Differential Revision: D30368834 fbshipit-source-id: 65a42b9da1a9084ba2337b41eaab9b612583c408	2021-09-28 08:32:30 -07:00
Nikita Shulga	e155e7520f	MaxUnpooling: parallel_for not always backed by OMP (#65655 ) Summary: Use `c10::optional` + thread_fence instead of `#pragma omp critical` inside max_unpooling kernels Using any OpenMP pragma in `at::parallel_for` body is wrong, as it can be implemented using native treading algorithms such as ptrheads `c10::optional` sounds like a much better approach to pair of `has_error` and `error_index` variables. Use `std::atomic_thread_fence` to ensure error_index value is synchronized. It also fixes ICE reported in https://github.com/pytorch/pytorch/issues/65578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65655 Reviewed By: ngimel Differential Revision: D31206501 Pulled By: malfet fbshipit-source-id: 93df34530e721777b69509cd6c68f5d713fb2b2a	2021-09-28 08:13:58 -07:00
Nikita Vedeneev	26e31f76b0	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: gchanan Differential Revision: D31206837 Pulled By: albanD fbshipit-source-id: 040beda97442e7a88a9df9abc7bb18313ce55bc3	2021-09-28 06:51:32 -07:00
Prabhat Roy	2ea724b1fd	Added option to update parameters using state_dict in AveragedModel (#65495 ) Summary: While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2	2021-09-28 03:34:49 -07:00
Peter Bell	3324bae5f1	Remove THCTensor.cu and THCTensorCopy.cu copy (#65491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65491 The only user of any of this code is THCStorage_copy, so I've migrated that to call `Tensor.copy_` directly. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148183 Pulled By: ngimel fbshipit-source-id: 92bab71306c84bc481c47a0615ebb811af2c2875	2021-09-27 23:21:45 -07:00
Hariom Narang	6a99053515	Added sparse-tensor copy logic to dispatcher (#65304 ) Summary: - Only ported copy for sparse tensor to dispatcher. Everything else is the same - Duplicated code for named tensor handling in sparse tensor copy - Might change it later to handle named tensors using dispatcher Issue https://github.com/pytorch/pytorch/issues/61122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65304 Reviewed By: gchanan Differential Revision: D31176720 Pulled By: ezyang fbshipit-source-id: 56757a3b0fb56c3d05c16dd935428a0cd91ea766	2021-09-27 20:08:27 -07:00
Ivan Kobzarev	43d47bdcca	[tensorexpr] conv2d handle optional bias (#64750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64750 conv2d bias is optional. It will be ArgNone in processing of the graph. This bias is prim::constant NoneType, so we do not know shape at the moment of constant binding. This adding it as a constant zeros Tensor at the moment of graph processing => for that adding `std::vector<TensorExprKernel::ConstantDescr>& constants and std::vector<at::Tensor>& constant_tensors` to `computeOperandValue` as it is not in `TensorExprKernel` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30842101 Pulled By: IvanKobzarev fbshipit-source-id: 88020f6934e43fe606f8eae928b7e21b7c3f15f6	2021-09-27 20:00:53 -07:00
Ivan Kobzarev	31ea4358d8	[tensorexpr] Add Op handling for mobilenetv3 large (#64741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64741 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30839110 Pulled By: IvanKobzarev fbshipit-source-id: d8e89c086c713fbe816dd8c8096cd64c05dc7431	2021-09-27 20:00:51 -07:00
Ivan Kobzarev	c28e3ffb4b	[jit] Shape propagation batch_norm, dropout, quantize, hardswidh (#64740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64740 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D30839111 Pulled By: IvanKobzarev fbshipit-source-id: c8f477ee05769865c0a23127b7f8a8276f46b54e	2021-09-27 19:59:34 -07:00
Peter Bell	46b3fc032a	Migrate remainder of THCDeviceUtils.cuh to ATen (#65472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65472 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148181 Pulled By: ngimel fbshipit-source-id: f777ba85b1cd8cb98b0ceb1756c558dde5862fc2	2021-09-27 19:37:06 -07:00
Yi Wang	12137db5e3	Fix the slowdown of _object_to_tensor since 1.9 (#65721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721 #Closes: https://github.com/pytorch/pytorch/issues/65696 The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9. ghstack-source-id: 139128267 Test Plan: Performance test: ``` import time from torch.distributed.distributed_c10d import _object_to_tensor start = time.time() _object_to_tensor("x" * 50_000_000) print("Time:", time.time() - start) ``` Reviewed By: rohan-varma Differential Revision: D31219794 fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208	2021-09-27 19:22:10 -07:00
Jordan Fix	002ff19836	[acc_utils] Fix off by one for model info getter (#65708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65708 att Test Plan: added unit test Reviewed By: khabinov Differential Revision: D31209992 fbshipit-source-id: c1b4e70bd9705dcfdf3039cb8791149c8646f1d7	2021-09-27 19:01:55 -07:00
Priya Ramani	63bb7c6dba	Refactor AotCompile to return a pair (#65707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707 Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code Testing: Gives expected results when compiled and run ``` (pytorch) ~/local/pytorch refactor_aot └─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2" The compiled model was saved to mobilenetv3.compiled.pt ``` Test Plan: Imported from OSS Reviewed By: qihqi Differential Revision: D31220452 Pulled By: priyaramani fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a	2021-09-27 18:56:04 -07:00
Rui Zhu	e9327ed2ce	Add nn.function.hardtanh in acc_tracer (#65639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65639 This op is used by mobilenet v2. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh buck test glow/fb/fx/acc_tracer:test_acc_shape_inference -- hardtanh buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh Reviewed By: yinghai Differential Revision: D31184297 fbshipit-source-id: 5a04319f6d16fb930372442616e27211107ecc67	2021-09-27 18:40:18 -07:00
Ben Koopman	6a6ee92e36	[quant] Add op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65241 Test Plan: Imported from OSS Reviewed By: jingsh Differential Revision: D31150087 Pulled By: b-koopman fbshipit-source-id: a00d4995841eee81305d0007c908473cc3d5a727	2021-09-27 16:01:49 -07:00
Alban Desmaison	7c62b6e973	add deepcopy support to subclasses (#65584 ) Summary: Happy to get any feedback on how to make this code cleaner! This: - Fix Tensor attribute deepcopy BC-breaking? - Add a test for Tensor attribute deepcopy - Fix subclass deepcopy - Moves the subclass serialization tests into their own class not to interfere with other serialization test logic - Add a test for subclass deepcopy cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65584 Reviewed By: gchanan Differential Revision: D31206590 Pulled By: albanD fbshipit-source-id: 74a8f0767f4933b9c941fbea880a8fd1b893ea2f	2021-09-27 14:36:22 -07:00
Peter Bell	f5b4e369f6	Sparse SoftMax: Remove unused variables (#65539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65539 This function doesn't directly use thrust so these are simply unused variables. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31193191 Pulled By: malfet fbshipit-source-id: 231b6a197c9f1bd5a61e46cb858e8eedc85b2818	2021-09-27 13:51:49 -07:00
Nikita Shulga	e1340d4282	[GHA] Small refactors (#65647 ) Summary: Introduce `main` method in generate_ci_workflows Check that all `ciflow/` labels start with the same prefix Move `ciflow_should_run` defenition to common.yml.j2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65647 Reviewed By: janeyx99 Differential Revision: D31189537 Pulled By: malfet fbshipit-source-id: 7cc47f63fb334c57f450034b931ff5bae1c0ed8b	2021-09-27 13:14:49 -07:00
Sujoy Saraswati	fea32be964	Add HPU type for check_base_legacy_new (#65410 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65410 Reviewed By: H-Huang Differential Revision: D31143754 Pulled By: malfet fbshipit-source-id: 32abfbae4f7c09924c7dfa16758d64a2215ec636	2021-09-27 13:13:34 -07:00
Nikita Shulga	82e0bf44c0	Apply linter suggestions to #65137 (#65459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65459 Just run linter on the change and apply all suggestions Test Plan: N/A Reviewed By: seemethere Differential Revision: D31102960 fbshipit-source-id: 04e1d07935690f2ddbc64533661b3e55379d13b5	2021-09-27 13:07:40 -07:00
David Riazati	811601e19a	Upload sccache stats (#65582 ) Summary: This adds some tracking to metrics.pytorch.org for sccache build stats per environment Pull Request resolved: https://github.com/pytorch/pytorch/pull/65582 Reviewed By: malfet, zhouzhuojie, janeyx99 Differential Revision: D31160761 Pulled By: driazati fbshipit-source-id: a497918bafbe610a51c92a9139684cd3efe670d3	2021-09-27 12:55:10 -07:00
Richard Zou	ea546e20fd	[Reland] nn.functional.linear OpInfo (#65498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65498 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31171149 Pulled By: zou3519 fbshipit-source-id: badb06af08a772397b0280189385723c0175200b	2021-09-27 12:42:46 -07:00
Yi Zhang	b91375f741	upgrade windows cuda installer: cu11.1.0 to cu11.1.1 (#65669 ) Summary: Fixes pytorch/vision#4483 Please merge it with https://github.com/pytorch/builder/pull/857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65669 Reviewed By: gchanan Differential Revision: D31205107 Pulled By: janeyx99 fbshipit-source-id: 654f0440ad33d2517db95d64df64e14de1233ad7	2021-09-27 12:27:19 -07:00
Michael Suo	cd2656a2e5	[package] add some docs describing how to debug dependencies (#65704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65704 As title. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D31209866 Pulled By: suo fbshipit-source-id: 4c8ec1d5418ea75b71c4b9a498b86f0ef5383544	2021-09-27 12:14:23 -07:00
Sujoy Saraswati	10d0dbc6d9	Avoid storage access for HPU tensors (#65409 ) Summary: Add is_hpu() methods for Aten tensor and device Pull Request resolved: https://github.com/pytorch/pytorch/pull/65409 Reviewed By: wconstab, H-Huang Differential Revision: D31134422 Pulled By: malfet fbshipit-source-id: 181ebb67dce8e05a0723ef3c82f23e39228841ee	2021-09-27 11:54:30 -07:00
Jane Xu	aa5d2a8d86	Remove confusing SHARD_NUMBER resetting logic (#65701 ) Summary: The SHARD_NUMBER reset was to figure out a way to differentiate whether we had just one shard vs multiple. We shouldn't reset SHARD_NUMBER but instead should just pass and use NUM_TEST_SHARDS for clarity and ease of scaling up to more shards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65701 Reviewed By: driazati Differential Revision: D31209306 Pulled By: janeyx99 fbshipit-source-id: 3a3504bd47e655d62aa0d9ed2f4657ca34c71c0e	2021-09-27 10:55:00 -07:00
Shen Li	facff2ec65	Update ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64943 Most ProcessGroup Collective APIs are pure virtual. As a result, c10d extensions need to override all of them and throw an error if they don't need certain APIs. This is too verbose for users. This commit changes those collective APIs to virtual and throws an error by default. Note that ProcessGroup is still an abstract class as `getBackendName` is a pure virtual function that all subclasses have to override. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: cbalioglu Differential Revision: D30906866 Pulled By: mrshenli fbshipit-source-id: c4df8962d60350a44d2df72fd04f9dd6eadb9fa6	2021-09-26 19:19:43 -07:00
nayef211	cd80bbe5f5	Bug fixes in dataframe_wrapper (#65629 ) Summary: ## Description - Updated functions in `dataframe_wrapper.py` to return values - Fixed bug in `set_df_wrapper` to update `global default_wrapper` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65629 Reviewed By: ejguan Differential Revision: D31180110 Pulled By: Nayef211 fbshipit-source-id: a8046e582fd6ce982fcdc89dae4932d0edc83d6b	2021-09-25 21:09:41 -07:00
Rohan Varma	1c8949c51a	[BE] Run Zero test internally (#65519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65519 Adds buck target so we can run this internally. ghstack-source-id: 139009957 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D31072784 fbshipit-source-id: 7185cc1e6f9df3d79251eb017270471942a9d7dd	2021-09-25 13:26:50 -07:00
Rohan Varma	f70147b426	[BE] Enable ZeRO test on windows (#65385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65385 Enables the ZeRO tests to run on windows. Closes https://github.com/pytorch/pytorch/issues/63086. Backend == NCCL was used as a proxy to see if we were running under CUDA, but Windows GPU tests uses Gloo. In this case use Gloo on GPU. For some reason these tests don't seem to test Gloo on GPU with ZeRO in general (picks NCCL backend when GPU is available), so kept that behavior for now. ghstack-source-id: 139003920 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31071181 fbshipit-source-id: 45a76309ac5e882f5aa6c4b130118a68800754bb	2021-09-25 13:25:40 -07:00
CodemodService Bot	4fe66d962d	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31192084 fbshipit-source-id: 25d490783b876253ddd1ad0a70832766ebd33f51	2021-09-25 06:42:19 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	146817c9d0	Add all_paths utility function (#65602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65602 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D31163681 Pulled By: tugsbayasgalan fbshipit-source-id: fa0b28b1d3b73efcc7671698a613e695a01cc103	2021-09-25 01:11:20 -07:00
Mikhail Zolotukhin	0256c3be50	[TensorExpr] Delete dtype_ field from Let - it should use its var's dtype. (#65634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65634 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31182697 Pulled By: ZolotukhinM fbshipit-source-id: 572ecd74cdf2a671ee98e81f0b3e387f3d9c6202	2021-09-25 00:11:06 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Rui Zhu	cda2ee9016	Add nn.function.hardswish in acc_tracer (#65590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65590 hardswish is used by mobile net v3 oss model. This diff added hardswish support in acc_tracer Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardswish Reviewed By: 842974287 Differential Revision: D30950061 fbshipit-source-id: cab57b8de5bea3a4d9d2b7d2a410d9afe787d66f	2021-09-24 17:30:39 -07:00
Akshit Khurana	1de8976e85	Add quantized::convtranspose2d (#63914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531889 fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d	2021-09-24 17:07:29 -07:00
Akshit Khurana	ab5eb56983	add qmul (#63913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531890 fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d	2021-09-24 17:06:17 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Kyle Chen	af7238f214	Rocm4.3.1 nightly (#65624 ) Summary: Depends on pytorch/builder#851. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65624 Reviewed By: zou3519 Differential Revision: D31180780 Pulled By: malfet fbshipit-source-id: 98a51eb45985ef648108e811d2c02231ec8b3a1f	2021-09-24 16:21:01 -07:00
Mikhail Zolotukhin	15724bcc03	[TensorExpr] Re-enable a float16 test. (#65632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65632 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D31181798 Pulled By: ZolotukhinM fbshipit-source-id: 1a57d0a878d44f8b73f3c24eef7ba707ce18fb70	2021-09-24 15:15:42 -07:00
Thomas J. Fan	0d3bf97fd0	TST Adds test for non-contiguous tensors (#64954 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 This PR: 1. Adds test for non-contiguous tensors 2. Fixes bug in `NLLLoss` that was catch by the test. The reason this was not catch in `common_nn` is because `CriterionTest` overrides `test_cuda` but does not call `test_nonconfig`. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64954 Reviewed By: zou3519 Differential Revision: D31174149 Pulled By: jbschlosser fbshipit-source-id: a16073e59b40ccc01c82ede016b63a8db2e810f5	2021-09-24 15:05:09 -07:00
Jane Xu	a839cec0ad	.github: GHA retry docker pull (#65103 ) Summary: This should help alleviate workflows failing due to docker pull timing out, which doesn't happen often, but did happen once in the past day. Was also reported in https://github.com/pytorch/pytorch/issues/65439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65103 Reviewed By: driazati Differential Revision: D31157772 Pulled By: janeyx99 fbshipit-source-id: 7bf556f849b41eeb6dea69d73e5a8e1a40dec514	2021-09-24 14:31:43 -07:00
Peter Bell	68e5935498	Remove fgrad_input from slow_conv2d (#64280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4	2021-09-24 14:27:39 -07:00
John Clow	71d1d16acb	Moving the constant parameter check to a more common file (#64251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64251 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31161850 Pulled By: Gamrix fbshipit-source-id: 5db3e6d52c99c1f40455c601122bb7680a287ae5	2021-09-24 13:54:27 -07:00
Dhruv Matani	640a615150	[easy] [PyTorch Edge] Remove double pragma once directive in the generated code (#65620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65620 This was bothering me for a while. ghstack-source-id: 138914860 Test Plan: Sandcastle Reviewed By: beback4u Differential Revision: D31162648 fbshipit-source-id: 72c47ea34d40c772bb53da721fcb36365b5dbaf3	2021-09-24 13:14:37 -07:00
Thomas J. Fan	57e066e188	TST Adds gradcheck and gradgradcheck to module info (#64444 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444 Reviewed By: pbelevich Differential Revision: D31174672 Pulled By: jbschlosser fbshipit-source-id: 86dc3576479974fd0996f06298c09692c07e6b24	2021-09-24 13:10:29 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Scott Wolchok	eca4f14b6c	[PyTorch] Add C10_ prefix to MPARK_* macros in variant.h (#65589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65589 Without this prefix, the include guards interfere with attempts to indirectly include both c10::variant and the original mpark variant in the same translation unit. ghstack-source-id: 138901838 Test Plan: Temporarily `#include <c10/util/variant.h>` in ivalue.h and buck build //data_preproc/preproc:preproc_adapter_utils mode/no-gpu -- this delayed D31101962 (`01720d6a23`) from fixing S244170 Reviewed By: bhosmer Differential Revision: D31159414 fbshipit-source-id: 234c5ed37ca853702bcdf3263e4f185b95ac1d08	2021-09-24 12:57:26 -07:00
Yi Wang	7f25c3e666	Update distributed.rst to show that CUDA send/recv on GPU is supported (#65601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65601 I believe this feature was supported one year ago: https://github.com/pytorch/pytorch/pull/44921 #Closes: https://github.com/pytorch/pytorch/issues/65525 ghstack-source-id: 138918961 Test Plan: N/A Reviewed By: pritamdamania87, mingzhe09088 Differential Revision: D31163535 fbshipit-source-id: 9321a0a5137a3e265e2b54bd78730ac28c7acd55	2021-09-24 12:30:10 -07:00
Richard Barnes	760aefd34d	Fix nullptr addition (#65548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65548 Fixes caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU) Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31148405 fbshipit-source-id: 4c8c693a45229ab4e59b0b0ec5326d3ac114dbaf	2021-09-24 11:43:22 -07:00
Zhengxu Chen	c3b09e977a	[fx2trt] Refresh execution context across save/load for TRTModule. (#65592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65592 IExecutionContext might not be safe to be serialized, therefore the simplest way to support save/load of TRTModule is to re-populate the execution context upon every load. ghstack-source-id: 138904770 Test Plan: buck run mode/dev-nosan -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test Reviewed By: zrphercule Differential Revision: D31070427 fbshipit-source-id: 88c58c6ce50e6dc9383d7f9419b5447cb89a4a3a	2021-09-24 11:36:57 -07:00
XiaobingSuper	1682722152	keep output type after calling SubgraphRewriter (#65453 ) Summary: For jit SubgraphRewriter, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying SubgraphRewriter, the tensor shape info was eliminated. The activation is that I want to replace pytorch convolution with a customer's convolution, I first register aten::_convolution as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as aten::conv2d, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing aten::conv2d with the customer's convolution. Before rewrite: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4, %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3. 6/site-packages/torch/nn/modules/conv.py:443:0 %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py: 24:0 return (%16) ``` after rewrite by using aten::conv2d ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0 return (%16) ``` expected result after replace aten::_convolution with aten::conv2d: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py :24:0 return (%16) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453 Reviewed By: zdevito Differential Revision: D31162489 Pulled By: ZolotukhinM fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49	2021-09-24 11:07:40 -07:00
Peter Bell	f3587f6bfa	Remove THC ScalarConvert (#65471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65471 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148182 Pulled By: ngimel fbshipit-source-id: bbf74e36a3d91a7be3e47199981440c68a2f645f	2021-09-24 10:29:51 -07:00
Andres Suarez	5b2a7eaa03	[codemod][fbcode/caffe2] Apply all buildifier fixes Test Plan: Visual inspection. Sandcastle. Reviewed By: zsol Differential Revision: D31170304 fbshipit-source-id: ee56312b5262247bb5a2e68a66d51f6cb3a0bf82	2021-09-24 09:03:29 -07:00
Alban Desmaison	b858993c97	Fix engine check for case where grad is a subclass (#65568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65568 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31158089 Pulled By: albanD fbshipit-source-id: 2a77df9b6340107de02a043b57a36cb7ae68df34	2021-09-24 08:41:19 -07:00
Alban Desmaison	e742839f0e	Fix autograd engine test in python_dispatch (#65567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65567 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31158090 Pulled By: albanD fbshipit-source-id: 651b78016ad978c7419343554ce7ceffd54aef1b	2021-09-24 08:39:52 -07:00
Mike Iovine	ef9e560796	[Static Runtime] Add aten::remainder out variant (#64967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64967 Out variant implementation for `aten::remainder`. Added both scalar and tensor overloads. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Remainder` Reviewed By: d1jang Differential Revision: D30915469 fbshipit-source-id: 9f27f18c86d66b11eac0aa4659c7062cb785b7e9	2021-09-24 07:51:39 -07:00
Mike Iovine	b003b2a9c0	[Static Runtime] Add record functions (#64698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64698 Reviewed By: hlu1 Differential Revision: D30747191 fbshipit-source-id: 7ded6ea9bd36b5e3343d1efa9f3c92e02ff6d7f8	2021-09-24 07:20:17 -07:00
Philip Meier	fd24e1b61f	add `OpInfo` for `torch.repeat_interleave` (#65455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65455 Addresses facebookresearch/functorch#103. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31111696 Pulled By: zou3519 fbshipit-source-id: 4fa73708fa915cb21adbba9cb8fd2b8f75bcd3e0	2021-09-24 07:16:08 -07:00
Philip Meier	d85e12a5bf	add OpInfo for `torch.argsort` (#65454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65454 Addresses facebookresearch/functorch#103. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31111700 Pulled By: zou3519 fbshipit-source-id: ec4babd2fcdcea856ba0ee8db0fd8f42b87269f3	2021-09-24 07:14:41 -07:00
CodemodService FBSourceClangFormatLinterBot	ca66698202	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31166199 fbshipit-source-id: 3fb46d64aba5e7c443b70beda77338f2ee63a99e	2021-09-24 02:57:37 -07:00
Mikhail Zolotukhin	cc4db35205	[TensorExpr] Break circular dependency of shared pointers in MemDependencyChecker. (#65600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65600 Previously AccessInfo owned two maps: dependencies_ and dependents_, which represented an edge in dependency graph. These two maps were holding shared pointers and thus each edge immediately became a cycle, which resulted in memory leaks. This PR makes one of the ends of these edges weak pointer thus breaking the loop. Test Plan: buck test mode/dbgo-asan-ubsan //search/lib/query_expansion/candidate_generator/test:transliteration_expander_test -- --exact 'search/lib/query_expansion/candidate_generator/test:transliteration_expander_test - TransliterationExpander.romanizationByLocaleTest' Reviewed By: bertmaher Differential Revision: D31163441 Pulled By: ZolotukhinM fbshipit-source-id: 9cef921f5c9293f1237144d1ee92e31f3e44c00a	2021-09-23 23:33:36 -07:00
Elias Ellison	01720d6a23	[JIT] constant object compilation unit ref fix (#65442 ) Summary: // A non owning pointer to a type. When a class get inserted as a constant // into a graph, if we used a strong pointer we would have a circular reference // from Object -> CompilationUnit and CompilationUnit -> Graph (which owns the // Constant Object) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65442 Reviewed By: ezyang Differential Revision: D31101962 Pulled By: eellison fbshipit-source-id: f1c1cfbe5a8d16a832cad7ba46e2a57a98670083	2021-09-23 22:43:02 -07:00
Ansley Ussery	f83250fd4e	Revert logic in `mobile/type_parser.cpp` (#65556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65556 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D31149080 Pulled By: ansley fbshipit-source-id: d5986d019fc2c47fd45cc10f0397499cc1e81329	2021-09-23 22:26:02 -07:00
BowenBao	20143bf07f	[ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257 ) (#64382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382 * This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit. * When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. * This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905265 Pulled By: malfet fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:48 -07:00
BowenBao	478d4cf883	[ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815 ) (#64380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380 * `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function. * Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905266 Pulled By: malfet fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:46 -07:00
BowenBao	9323ea2195	[ONNX] minor doc improvements and cleanup (#62514 ) (#64373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373 * Fix some bad formatting and clarify things in onnx.rst. * In `export_to_pretty_string`: * Add documentation for previously undocumented args. * Document that `f` arg is ignored and mark it deprecated. * Update tests to stop setting `f`. * Warn if `_retain_param_name` is set. * Use double quotes for string literals in test_operators.py. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905271 Pulled By: malfet fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3	2021-09-23 22:20:44 -07:00
BowenBao	9965163751	[ONNX] Add supplementary tests and description for custom_opsets param from torch.onnx.export() function. (#62085 ) (#64372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64372 custom_opsets arg from torch.onnx.export() is no needed to be removed. Add some supplementary description and tests for easier understanding. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905269 Pulled By: malfet fbshipit-source-id: 489fbee0e2c1d6c5405c9bf7dfd85223ed981a44 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:42 -07:00
BowenBao	fb71ccf0f1	[ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712 ) (#64371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371 As of now, the "strip_doc_string" parameter was described as below: strip_doc_string (bool, default True): do not include the field doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``. This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits. To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter. But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR. This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905268 Pulled By: malfet fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60 Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-23 22:20:40 -07:00
BowenBao	47d1ed60e1	[ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702 ) (#64370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370 As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1. Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905270 Pulled By: malfet fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-23 22:18:52 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Michael Dagitses	8c7caedbb8	avoid re-allocation of view_shape for every tensor in `torch.meshgrid` (#62908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62908 Reviewed By: mruberry Differential Revision: D31064165 Pulled By: dagitses fbshipit-source-id: 3ddc3088e70fc8ef6dcf56ceb67fd20991169af1	2021-09-23 21:41:51 -07:00
Peter Bell	963ae25e41	Migrate THCAtomics to ATen (#65470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65470 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148184 Pulled By: ngimel fbshipit-source-id: aaac3dfb5f2c6f88e9bd922b3a56d0a16a861e17	2021-09-23 19:43:34 -07:00
Sujoy Saraswati	c73f0e457e	Tensor and device is_hpu methods (#65408 ) Summary: Add is_hpu() methods for Aten tensor and device Pull Request resolved: https://github.com/pytorch/pytorch/pull/65408 Reviewed By: malfet Differential Revision: D31144227 Pulled By: wconstab fbshipit-source-id: 115f4df4b8d54e6913dd51af7b6d4cacf6dd43c5	2021-09-23 18:42:45 -07:00
Shen Li	d78b3909e8	Explicitly destory ProcessGroup in allgather_coalesced_async test (#65513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65513 The error in #65231 means some child threads were destructed before joined. I added some trace and prints and found that, in the failed tests, all `assertEqual` are passed, but the `ProcessGroupGloo` destructor wasn't called in one of the process. It could be due to the only guarantee that Python makes is that garbage collection MAY happen before the program exits. This commit adds an explicit `destroy_process_group()` to alleviate the problem. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31134174 Pulled By: mrshenli fbshipit-source-id: 2e42fe93d3f16ce34681b591afc15a6ac0b9fab6	2021-09-23 18:35:08 -07:00
Jerry Zhang	b77c979102	[quant][fx][graphmode] Make FixedQParam ops work for dtypes other than quint8 (#65484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65484 This PR makes sure we only use FixedQParamFakeQuantize for quint8 dtype and allows user to use other dtypes for ops like sigmoid, this is useful for producing reference pattern for these ops that can be used in other backends like TensorRT Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D31120377 fbshipit-source-id: 3b529d588e2b6ff0377a89c181f6237f8f0cc2f5	2021-09-23 18:29:56 -07:00
Jane Xu	a2e631b874	Windows GHA: Only upload artifacts if prev steps pass (#65561 ) Summary: Fixes a task in https://github.com/pytorch/pytorch/issues/65439 And removes the Upload to GitHub step as it's redundant with the S3 step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65561 Reviewed By: seemethere Differential Revision: D31157685 Pulled By: janeyx99 fbshipit-source-id: cd23113a981eb4467fea3af14d916f6f2445a02b	2021-09-23 17:38:39 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
Tao Xu	f850d7ef2e	[CoreML][OSS] Add Simulator tests (#65076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65076 ghstack-source-id: 138869950 create a new conda environment - conda create --name coreml python=3.8 conda activate coreml pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html pip install coremltools==5.0b5 cd pytorch git fetch git checkout gh/xta0/131/head cd ios/TestApp/benchmark mkdir ../models python coreml_backend.py Test the model_coreml.ptl in the helloworld example Test Plan: 1. CircleCI 2. Pytorch nightly builds Reviewed By: hanton Differential Revision: D30912268 fbshipit-source-id: 52b2ed1ad40e5949ee2755bca113119132dfc914	2021-09-23 14:57:01 -07:00
Tingting Markstrum	2a0208f4dc	fixed comments referring fairscale master branch (#65531 ) Summary: replace comments referring fairscale master branch with main branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/65531 Test Plan: buck build cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Reviewed By: H-Huang, anj-s Differential Revision: D31132552 Pulled By: tmarkstrum fbshipit-source-id: d3ee8920ab5cccad99f640934c21e8eee022e9b9	2021-09-23 14:37:58 -07:00
Andres Suarez	c015cbabf9	[codemod][fbcode/caffe2] Apply all buildifier fixes Test Plan: Visual inspection. Sandcastle. Reviewed By: zsol Differential Revision: D31144864 fbshipit-source-id: f8e65fec69f88d03048df9edb98969d648eb6981	2021-09-23 14:03:19 -07:00
Shiyan Deng	d07b2cb4ec	[fx2trt] update the oss fx2trt exmaple (#65544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65544 ATT Test Plan: CI Reviewed By: mikekgfb Differential Revision: D31147750 fbshipit-source-id: eacc1c9157a32d6deebbfe9ff2aaae13c434e72b	2021-09-23 13:45:22 -07:00
Rohan Varma	71704349aa	[DDP] Allow await of custom buffer reduction in backward (#64515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138793803 Test Plan: I Reviewed By: zhaojuanmao Differential Revision: D30757761 fbshipit-source-id: e1a2ead9ca850cb345fbee079cf0614e91bece44	2021-09-23 13:02:53 -07:00
John Clow	36485d36b6	Docathon: Add docs for nn.functional.*d_max_pool (#63264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63264 Adding docs to max_pool to resolve docathon issue #60904 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31071491 Pulled By: Gamrix fbshipit-source-id: f4f6ec319c62ff1dfaeed8bb6bb0464b9514a7e9	2021-09-23 11:59:50 -07:00
Facebook Community Bot	1f0f246fe2	Automated submodule update: FBGEMM (#65360 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `0108d4f552` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65360 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D31061552 fbshipit-source-id: 8bce5157a281e38cad5d5d0e9dcd123beda39735	2021-09-23 11:47:15 -07:00
Michael Suo	65fbd2c12b	[ci] do not continue through error on trunk (#65503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65503 There are two reasons for this change: - I don't think trunk jobs should have different behavior than their PR equivalents. - Continuing through error makes it challenging to figure out what is actually failing, especially given the poor UX of GitHub Actions when it comes to reading logs Example: https://github.com/pytorch/pytorch/runs/3680114581. Here, there is a failure but the rendered test results tell me everything is successful. I have no idea how to quickly tell what failed; the log is so long and terms like "error", "failure", etc. are common enough that searching it is very difficult. Differential Revision: D31130478 D31130478 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: suo fbshipit-source-id: 15a80475ca4c49644c0f7b779f5c6c2ffeb946a6	2021-09-23 11:36:03 -07:00
Rodrigo Berriel	7e772e7685	Update link to tutorial on defining NN modules (#65534 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in `ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2)`, but the link in this repository was not updated. It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534 Reviewed By: soulitzer Differential Revision: D31144269 Pulled By: H-Huang fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985	2021-09-23 11:26:50 -07:00
Michael Suo	cac7c1a192	[ci] remove auto-label-rocm workflow (#65558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65558 This will temporarily be replaced by an FB-internal workflow that does the exact same thing, pending a migration of this workflow to probot. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Test Plan: Imported from OSS Reviewed By: zhouzhuojie, driazati Differential Revision: D31149105 Pulled By: suo fbshipit-source-id: 2aa122820ae3b5286774501f5ecfe052bc949dea	2021-09-23 11:15:35 -07:00
Nikita Shulga	c731be8066	[BE] Use `DispatchKeySet` in `check_base_legacy_new` (#65535 ) Summary: Refactor: ``` TORCH_CHECK ( key == a \|\| key == b \|\| key == c, "expected key to be in ", a, " or ", b , " or ", c, " but got ", key); ``` into ``` TORCH_CHECK( key_set.has(key), "expected key to be in ", key_set, " but got ", key ); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65535 Reviewed By: wconstab Differential Revision: D31144239 Pulled By: malfet fbshipit-source-id: 68a053041a38f043e688e491889dd7ee258f3db3	2021-09-23 11:01:23 -07:00
Bin Wen	da166d4f12	Add a timeout argument to RPC shutdown() (#65425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65425 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS python3 test/distributed/rpc/test_tensorpipe_agent.py -v -k test_wait_all_workers_timeout Reviewed By: mrshenli Differential Revision: D31092483 Pulled By: dracifer fbshipit-source-id: 5b5e9f20b1d6602cf8cde3772678f721dddf0d78	2021-09-23 10:42:58 -07:00
Scott Wolchok	97b535dabd	[PyTorch] add fastToString for infer_schema (#64823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64823 We seem to spend noticable time in vfprintf for this, and the number of arguments is almost always small enough to do this in just a few instructions. ghstack-source-id: 138623354 Test Plan: Profile schema parsing, saw less time in vfprintf Reviewed By: ezyang, dhruvbird Differential Revision: D30860716 fbshipit-source-id: 09ef085cd6f93dc1eaa78790dde918ac60e67450	2021-09-23 10:15:40 -07:00
Scott Wolchok	eb949464d6	[PyTorch] Fix missing moves in SchemaParser::parseArgument (#64839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64839 Resulted in some extra shared_ptr refcount bumps. ghstack-source-id: 138623356 Test Plan: CI Reviewed By: smessmer Differential Revision: D30875749 fbshipit-source-id: 531f04c453f7410ed3d4ff054217f21a250be8e9	2021-09-23 10:14:22 -07:00
Raghavan Raman	14307f7a56	[Static Runtime] Added logging to dump the model graphs (#65509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65509 With this change, we can get dumps of the model graphs by setting the env variable `PYTORCH_JIT_LOG_LEVEL=">>impl"` while running the model. Test Plan: buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D31125797 fbshipit-source-id: d8979a4e138047518140e0eaecb46e012891b17c	2021-09-23 10:06:13 -07:00
Supriya Rao	767a104698	[quant] change observer FQNs generated in prepare step (#65420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65420 Context: In some FB use cases we have a need to map observer stats from train model checkpoint to inference model. We observerd that some buffer names are different becuase the intermediate activation tensors are generated differently across train and inference model. More details in https://fb.quip.com/PtGcAR0S5CQP Currently, for each observer (activation_post_process), the FQN of the module inserted is determined based on the FQN of the input tensor it is observing. In this change we change the observer FQN to include the FQN of the op/module it is observing rather than tensor/intermediate op names along with the “input”/“output” detail. Before ``` def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(x_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None linear_activation_post_process_0 = self.linear_activation_post_process_0(linear); linear = None return linear_activation_post_process_0 ``` After ``` def forward(self, x): mods1_input_activation_post_process_0 = self.mods1_input_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(mods1_input_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None mods1_output_activation_post_process_0 = self.mods1_output_activation_post_process_0(linear); linear = None return mods1_output_activation_post_process_0 ``` Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31088652 fbshipit-source-id: 2f1526f578a13000b34cfd30d11f16f402fd3447	2021-09-23 09:08:10 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
CodemodService FBSourceClangFormatLinterBot	2a4d5e4c6d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31138547 fbshipit-source-id: ba134ae7f057c918eaefdc6310f7663e187e9749	2021-09-23 07:54:32 -07:00
Kevin Tse	9668a8a82d	[DataPipe] Update Docstrings for Tar and ZipArchiveReader (#65500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65500 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31127241 Pulled By: NivekT fbshipit-source-id: aed41aa192fe55e10ba67beda460fac70f2703c7	2021-09-23 07:20:08 -07:00
Horace He	7e7be526c9	Add TORCH_SHOW_CPP_STACKTRACES to Contributing.md (#64052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64052 Reviewed By: ezyang Differential Revision: D31107779 Pulled By: Chillee fbshipit-source-id: 2ad8ad40cd48e54fe711863c3c74df884a2e2de7	2021-09-22 22:53:19 -07:00
Rui Zhu	14949d2922	Add nn.function.hardsigmoid in acc_tracer (#65422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65422 hardsigmoid is used by mobile net v3 oss model. This diff added hardsigmoid support in acc_tracer Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardsigmoid Reviewed By: jfix71 Differential Revision: D30950304 fbshipit-source-id: 8fe4b4c6df29c06a73850d32f59321a9311f94f5	2021-09-22 20:57:42 -07:00
Bert Maher	5525e9a591	Lock unpickling of source ranges Summary: The source is shared across all threads running the torchscript interpreter, so if several threads encounter errors at once, they will all race to unpickle the source, leading to memory corruption. Test Plan: Model 217993215_0 is the problematic model; I wasn't able to repro the crash with requests stored in Hive, but I could easily by adding my devserver (SMC tier predictor.bertrand) as a shadow tier to the model's tier (inference_platform.predictor_model.prod.bi.217993215_latest). (i.e., set shadow_tier property to predictor.bertrand=1 to proxy 1% of traffic). With this diff, the ASAN/TSAN errors go away. Reviewed By: suo Differential Revision: D31044009 fbshipit-source-id: 56f9ef3880e7cf09f334db71b4256e362b4de965	2021-09-22 20:41:02 -07:00
Jongsoo Park	228141f939	[pytorch] more informative error msg from fbgemm embedding spmdm call (#65186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65186 FBGEMM JIT'ed EmbeddingSpMDM kernel just returns false when there's an error delegating detailed error handling to the caller (since each framework like PyTorch and Caffe2 wants to do error handling differently). Many of PyTorch code was simply reporting there was "an" error without pinpointing exactly why error happened. This diff introduces more informative error msg following what Caffe2 was doing. Test Plan: CI Reviewed By: dskhudia Differential Revision: D31008300 fbshipit-source-id: b8d069af0692dc86dc642b18a9c68f22deaffea3	2021-09-22 20:13:34 -07:00
Shiyan Deng	0ca1102609	[fx2trt] fuse permute + matmul using a pass instead of hardcoding it as a leaf module (#65482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65482 Currently we hardcoded permute + bmm in a module and tagged it as a leaf module during tracing. This diff introduces a pass to fuse permute + matmul to a single node. TODO: For fusion transformation like this kind, they would actually share many similar code like finding the fusion pattern, replacing original nodes with fused node. Current fx subgraph rewriter allows us to specify patterns that we want to replace but we would need to extend it to allow specify constraint on nodes' kwargs. Reviewed By: yinghai Differential Revision: D31022055 fbshipit-source-id: 13d1f18d79b09d371897ecde840f582ccaf5713a	2021-09-22 18:43:09 -07:00
Shiyan Deng	fccaa4a3c8	[fx2trt] fix transpose unittest (#65481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65481 Previous we have `acc_ops.transpose` but after a recent diff `torch.transpose` is mapped to `acc_ops.permute`. Here we clean up the fx2trt unittest for transpose and add support for negative indices in permute. Reviewed By: wushirong Differential Revision: D31115280 fbshipit-source-id: 58e689e6dd14181aea5186f3bb5b8745a07d0e51	2021-09-22 18:08:55 -07:00
Wanchao Liang	2f67579864	[ddp] use named_params and named_buffers explicitly (#65181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65181 This PR changes `state_dict()` during sync to `named_parameters` and `named_buffers` explicitly. the underlying motivation is that, `state_dict()` doesn't necessarily equals to "params + buffers" for all cases, state_dict is used for checkpoint purpose mainly, and params/buffers are used for training, we might have cases that params/buffers be in different forms with state_dict (i.e. state_dict we might want to save in small pieces of tensors while in training we want to concat the tensors together for performance reasons). ghstack-source-id: 138701159 Test Plan: wait for ci Reviewed By: divchenko, rohan-varma Differential Revision: D31007085 fbshipit-source-id: 4e1c4fbc07110163fb9b09b043ef7b4b75150f18	2021-09-22 17:32:54 -07:00
Max Ren	0eaf081018	[JIT] canonicalize aten::rsub (#65014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65014 ghstack-source-id: 138656948 Test Plan: ``` (pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing ........s...................... ---------------------------------------------------------------------- Ran 31 tests in 0.393s OK (skipped=1) (pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole.test_normalized_rsub CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing . ---------------------------------------------------------------------- Ran 1 test in 0.015s OK ``` Reviewed By: eellison Differential Revision: D30941389 fbshipit-source-id: 03f0416d99090845c9bfb1e5fcf771d5f1d7a050	2021-09-22 17:20:46 -07:00
Balaji	32f0387ee8	Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758 ) Summary: ## {emoji:1f41b} Bug 'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'. In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts. The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error. This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object. ![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png) ## To Reproduce Steps to reproduce the behavior: 1. Give the value for the last_epoch argument as zero OR 1. Give the value for the last_epoch argument as a Positive integer. ## Expected behavior I only expected the 'CosineAnnealingWarmRestarts' object to be initialized. ## Environment PyTorch version: 1.9.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.2 Libc version: glibc-2.31 Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA ## Additional context We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758 Reviewed By: ezyang Differential Revision: D31113694 Pulled By: jbschlosser fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721	2021-09-22 16:55:14 -07:00
Rodrigo Berriel	b80bdcc73b	Add register_module alias to nn.Module (#65174 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`: ![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png) An alternative implementation could be: ```python def register_module(self, name: str, module: Optional['Module']) -> None: r"""Alias for :func:`add_module`.""" self.add_module(name, module) ``` which results in this documentation: ![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png) Questions: 1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](`873255c6d9/test/test_nn.py (L1420-L1434)`) and [test_add_module](`873255c6d9/test/test_nn.py (L1837-L1855)`). 2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](`873255c6d9/torch/_C/__init__.pyi.in (L468)`), which means something else, I think. Should I do anything about them? cc ngimel SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/65174 Reviewed By: soulitzer Differential Revision: D31089717 Pulled By: jbschlosser fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9	2021-09-22 16:37:28 -07:00
Raghavan Raman	31584d065e	[Static Runtime] Added NNC implementation for signed log1p kernel. (#65387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387 Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op. Also, added a SR microbenchmark for this kernel which shows the performance improvement. Without fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 1953 ns 1953 ns 358746 BM_signed_log1p/64 2049 ns 2049 ns 342145 BM_signed_log1p/512 3291 ns 3291 ns 214342 BM_signed_log1p/4096 15559 ns 15559 ns 44420 BM_signed_log1p/32768 101936 ns 101935 ns 6843 BM_signed_log1p/65536 194792 ns 194789 ns 3615 ``` With NNC fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 369 ns 369 ns 1896179 BM_signed_log1p/64 497 ns 497 ns 1406995 BM_signed_log1p/512 1618 ns 1618 ns 430209 BM_signed_log1p/4096 11327 ns 11326 ns 61463 BM_signed_log1p/32768 84099 ns 84086 ns 8325 BM_signed_log1p/65536 166531 ns 166510 ns 4186 ``` This clearly shows >15% improvement in performance of this kernel with NNC fusion. On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops: without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved) with NNC fusion: `0.55%` Test Plan: `buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1) ``` get 57220 prediction values get 57220 prediction values max_error: 0 total: 0 ``` Reviewed By: hlu1 Differential Revision: D30609492 fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd	2021-09-22 15:53:33 -07:00
Tao Xu	1c20b98b4b	[iOS][CoreML] Check backend availability at runtime. (#65315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65315 ghstack-source-id: 138703808 Test Plan: - OSS builds and BUCK builds - CircleCI Reviewed By: hanton Differential Revision: D31048011 fbshipit-source-id: 824a8e32d65de2caf25e41efef2b022ddbb63156	2021-09-22 15:38:53 -07:00
Peter Bell	2898ef7549	Minor ScanKernels.cu cleanup (#65350 ) Summary: - Replace THCNumerics with `at::_isnan` - Replace `contiguous` with `expect_contiguous` - Don't use `contiguous` on output tensors. Instead skip the copy and just create a new empty tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65350 Reviewed By: ezyang Differential Revision: D31103501 Pulled By: ngimel fbshipit-source-id: 9030869e28d6c570fad074fd0502076de8e2ab09	2021-09-22 15:34:01 -07:00
Rohan Varma	5739f77775	[DDP] Refactor and remove sync_params (#64514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64514 sync_params is a misnomer since we don't actually synchroniz parameters. While removing this I realized `self._check_and_sync_module_buffers` does almost everything we need it to, so just refactored that and made DDP forward call into it. ghstack-source-id: 138684982 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30751231 fbshipit-source-id: add7c684f5c6c71dad9e9597c7759849fa74f47a	2021-09-22 14:12:51 -07:00
Rohan Varma	ce5981e431	[DDP] Custom buffer reduction (#64513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64513 Proposal: https://github.com/pytorch/pytorch/issues/63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138655663 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30751152 fbshipit-source-id: 257a9d46bb178d8812d4ea5a4d9c6140b8a1791f	2021-09-22 14:11:35 -07:00
Nikita Shulga	923f06621c	Fix Windows ninja builds when MAX_JOBS is specified (#65444 ) Summary: Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463 Fixes regression introduced by `047e68235f` cc malfet seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444 Reviewed By: dagitses, seemethere Differential Revision: D31103260 Pulled By: malfet fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284	2021-09-22 14:04:31 -07:00
Zhengxu Chen	cbc3db8274	Create test for builtin tensorrt module in torch deploy (#63819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63819 ghstack-source-id: 138521664 Test Plan: buck test mode/dev-nosan caffe2/torch/csrc/deploy:test_deploy_gpu buck test mode/opt-split-dwarf caffe2/torch/csrc/deploy:test_deploy_gpu Reviewed By: wconstab Differential Revision: D30499301 fbshipit-source-id: 0bc165b4ed5be28ebb0becc65f292cf26368692f	2021-09-22 13:42:35 -07:00
Eli Uriegas	72fc53ff27	.github: Add timeout for test step (#65486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65486 Adding this after observing jobs running for 6+ hours on `pytorch/pytorch-canary`, still trying to debug why they happen there but this should resovle jobs running forever Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: ezyang, malfet, janeyx99 Differential Revision: D31117497 Pulled By: seemethere fbshipit-source-id: 126a10e844bdef77c2852cc5c392e5f37f130f7e	2021-09-22 13:23:41 -07:00
Jessica Choi	f24bd43375	Changing type and name of local_used_maps to reflect that it is only one map (#65380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65380 Fixing bugs that arise when running setup.py develop cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31104844 Pulled By: jaceyca fbshipit-source-id: acfd4cf316c71177df758ca55b470f51a17f776b	2021-09-22 11:35:33 -07:00
Chiang, Yu-Hsun (oToToT)	0fe86ac6c6	Fix torch.any documentation (#65310 ) Summary: Currently, the description of torch.any would be parsed like ``` param input the input tensor. ``` However, it should be ``` Tests if any element in input evaluates to True. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65310 Reviewed By: ezyang Differential Revision: D31102918 Pulled By: soulitzer fbshipit-source-id: 678ade20ba16ad2643639fbd2420c8b36fcd8bd7	2021-09-22 11:24:20 -07:00
Rodrigo Berriel	a0dea074b2	Remove `.data` from benchmarks and tensorboard (#65389 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks: - Remove the use of `.data` in all our internal code: - [x] `benchmarks/` - [x] `torch/utils/tensorboard/` cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389 Reviewed By: soulitzer Differential Revision: D31093464 Pulled By: albanD fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232	2021-09-22 11:16:59 -07:00
Edward Yang	70a545b21e	Add Tensor._make_wrapper_subclass (#65340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65340 I thought about a few possible ways of doing this. The main hazard is that if I create a CPU tensor that doesn't have any real storage, the moment I actually try to access the data on the tensor I will segfault. So I don't want to use _make_subclass on a "cpu meta tensor" because the CPU meta tensor (with no subclass) is radioactive: printing it will immediately cause a segfault. So instead, I have to create the CPU meta tensor AND subclass all in one go, and that means I need another function for it. One downside to doing it this way is I need another overload for explicit strides, and in general it is difficult to get the view relationships to all work out properly; tracked at https://github.com/pytorch/pytorch/issues/65339 Fixes https://github.com/pytorch/pytorch/issues/62972 Fixes https://github.com/pytorch/pytorch/issues/62730 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31057231 Pulled By: ezyang fbshipit-source-id: 73522769e093ae8a1bf0c7f7e594659bfb827b28	2021-09-22 11:10:47 -07:00
Rodrigo Berriel	11ca641491	[docs] Add images to some activation functions (#65415 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65368. See discussion in the issue. cc mruberry SsnL jbschlosser soulitzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/65415 Reviewed By: soulitzer Differential Revision: D31093303 Pulled By: albanD fbshipit-source-id: 621c74c7a2aceee95e3d3b708c7f1a1d59e59b93	2021-09-22 11:05:29 -07:00
anjali411	158393e1a1	Fix autograd engine checks and update InputMetadata (#65235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235 1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor. 2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor. 3. To fix https://github.com/pytorch/pytorch/issues/65016: a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has python key set on it (tensor subclass). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31117318 Pulled By: anjali411 fbshipit-source-id: 825401df98695c48bf9b320be54585f6aff500bd	2021-09-22 11:01:19 -07:00
Michael Suo	db4b68b3ac	Back out "Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1" Summary: Original commit changeset: 9cfda47cafb3 Test Plan: unland Reviewed By: ezyang Differential Revision: D31116643 fbshipit-source-id: 631eea446ed48c63ca39281d24163a2eadbe8d12	2021-09-22 10:37:27 -07:00
Michael Suo	b3ec88f41f	ugh (#65477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65477 Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D31115936 Pulled By: suo fbshipit-source-id: fb16911a683713fdc2393bfe7150fc29c7d6814f	2021-09-22 10:15:41 -07:00
Brian Hirsh	152f0236c3	Revert D31082693: Fix autograd engine checks and update InputMetadata Test Plan: revert-hammer Differential Revision: D31082693 (`9324d682fd`) Original commit changeset: cb551cd438c6 fbshipit-source-id: fc60f86b80fc70058984df6bccbf240d27f5843e	2021-09-22 10:00:08 -07:00
Michael Suo	7c9a278895	fix trailing newlines (#65474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65474 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31114952 Pulled By: suo fbshipit-source-id: 3b8cde2098635450c3e22571a401f78e4e54e9e0	2021-09-22 09:48:34 -07:00
Jerry Zhang	508845f2b5	[quant] AO migration of the `torch/quantization/quantize_fx.py` and `torch/quantization/fx/` (#65033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033 1. Move the file: ``` hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py ``` 2. Create new files ``` touch caffe2/torch/quantization/quantize_fx.py touch caffe2/torch/quantization/fx/__init__.py ``` 3. import things in the new files 4. add tests to test/quantization/ao_migration/test_quantization_fx.py this is because we have some fx import in quantize_fx and fx/.py Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: vkuzo, z-a-f Differential Revision: D30949749 fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3	2021-09-22 09:29:15 -07:00
Shirong Wu	762c2276e1	feed model merge net lower benchmark (#65191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65191 Test Plan: run command: buck run mode/opt -c python.package_style=inplace hpc/new/models/feed/benchmark:feed_lower_benchmark example output: Eager, BS: 2048, TFLOP/s: 253.25, Time per iter: 4.49ms, QPS: 456289.25 TensorRT, BS: 2048, TFLOP/s: 162.30, Time per iter: 7.00ms, QPS: 292426.58 Reviewed By: yinghai Differential Revision: D31010288 fbshipit-source-id: f30b520eca9508439588bcf48476b1b1edfb09af	2021-09-22 09:21:18 -07:00
Brian Hirsh	bcc6e3ab5e	add python API to print all operators that have kernels registered to a particular DispatchKey (#63575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63575 Test Plan: Imported from OSS Reviewed By: ezyang, Chillee Differential Revision: D30426919 Pulled By: bdhirsh fbshipit-source-id: b0e487e48dfe02f7b9d678403f0a2b5bfe146f4e	2021-09-22 09:15:55 -07:00
anjali411	9324d682fd	Fix autograd engine checks and update InputMetadata (#65235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235 1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor. 2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor. 3. To fix https://github.com/pytorch/pytorch/issues/65016: a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has python key set on it (tensor subclass). Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31082693 Pulled By: anjali411 fbshipit-source-id: cb551cd438c6ca40b0f18a4d0009e0861cf0fd4e	2021-09-22 07:49:52 -07:00
Peter Bell	f90d9b48db	test_neg_view: preseve sign of sample input (#63010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63010 This changes `test_neg_view` to call the operator with the same numeric values as the original sample input. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31082824 Pulled By: anjali411 fbshipit-source-id: 7d50f99dc0d1343247e366cbe9b0ca081bd0a9b1	2021-09-22 07:47:42 -07:00
Vitaly Fedyunin	9d17f21e46	Added PandasDataframeWrapper (#65411 ) Summary: - Added `PandasDataframeWrapper` around `pandas` functions to easily drop-and-replace`torcharrow` for Facebook internal use cases - Updated relevant datapipe/dataframe usesites to use the new `PandasDataframeWrapper` instead of calling `pandas` functions directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/65411 Reviewed By: VitalyFedyunin, hudeven Differential Revision: D31087746 Pulled By: Nayef211 fbshipit-source-id: 299901f93a967a5fb8ed99d6db9b8b9203634b8f	2021-09-22 07:42:45 -07:00
Edward Yang	3c6d9fd124	Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1 (#65376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65376 Let's suppose there's a bug in PyTorch and python_error gets thrown and never gets caught. Typically, you'll get a very useless error message like this: ``` terminate called after throwing an instance of 'python_error' what(): Aborted (core dumped) ``` Now, you'll get: ``` what(): unknown Python error (for more information, try rerunning with TORCH_SHOW_CPP_STACKTRACES=1) ``` and with TORCH_SHOW_CPP_STACKTRACES=1 you'll get: ``` what(): error message from Python object ``` If we're OK with making Python exceptions go even slower, we could eagerly populate unconditionally. I'm also not so happy we don't get a Python backtrace or the Python error name, that's worth improving (this is a minimal diff to get the discussion going.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31067632 Pulled By: ezyang fbshipit-source-id: 9cfda47cafb349ee3d6853cdfb0f319073b87bff	2021-09-22 07:12:28 -07:00
Nikita Shulga	2c7df1360a	Bump torch version to 1.11 (#65435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65435 Reviewed By: zhouzhuojie Differential Revision: D31099045 Pulled By: malfet fbshipit-source-id: 6ae6ca8a4b652fc51ee3138c800d067e144acbaa	2021-09-22 07:07:16 -07:00
Erjia Guan	96383ca704	Unify the output pathname of archive reader and extractor (#65424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65	2021-09-22 06:34:29 -07:00
Nikita Shulga	e331beef20	Delete code coverage jobs from CI (#65362 ) Summary: As it does not seem useful to the lots of peope, see https://fb.workplace.com/groups/1144215345733672/posts/2062909540530910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65362 Reviewed By: janeyx99, bdhirsh Differential Revision: D31061945 Pulled By: malfet fbshipit-source-id: 912ed92cc901a370a40448f1127c3ba43640ac43	2021-09-22 05:38:35 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit 03389dc851db6f3ca52f9a4455ce2090c64a223d. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Jerry Zhang	feefc94573	[fx2trt] Use itensor_to_tensor_meta to track the TensorMeta info for ITensor node (#65427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65427 Previously we added a input_tensor_meta for dequantize function, this is a bit hacky since this creates a dependency between the arguments of dequantize and if there are passes that changes the input then we would need to update tensor meta as well Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: soulitzer Differential Revision: D31094274 fbshipit-source-id: 5e40648d3081e2363f3a70bcc9745df4a8190ad3	2021-09-22 00:02:31 -07:00
Michael Carilli	64d3c7388f	[RELAND] Enable ncclAvg for reductions (#62835 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/62303. Reverts the revert, and restores some diffs that were mysteriously missing from the reverted revert. I think some of the diffs I pushed to the original PR raced with its import or landing, such that the original PR's merge didn't pick up all the diffs I wanted. I don't know enough about the landing process to do more than speculate wildly, but hopefully this resubmit sorts things out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62835 Reviewed By: zhouzhuojie, seemethere, janeyx99, heitorschueroff Differential Revision: D30999982 Pulled By: malfet fbshipit-source-id: 1f70ab4055208f1c6a80c9fc9fbe292ce68ecaa9	2021-09-21 18:09:45 -07:00
Yuan Shangguan (June)	3f5f721ab3	Pass through allow-list from prepare_qat into propagate_qconfig_ to allow custom mapping and custom QAT module (#65119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65119 Pytorch Quantization: allow prepare_qat to include custom module by passing allow_list into the prepare_qat. When we are implementing custom module and custom mapping for Quantization Aware Training (QAT), we need to add the custom module to the mappings and to the allow_list during prepare_qat. The allow_list needs to be surfaced to the propagate_qconfig_. Test Plan: relying on general unit test Reviewed By: supriyar Differential Revision: D30982060 fbshipit-source-id: 1114115b6a3b853238d33d72b5cbaafc60f463e0	2021-09-21 17:15:25 -07:00
Jessica Choi	158b8bdc8a	Cleaning up DDP SPMD in reducer.cpp (#64113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64113 Since there is only one model replica per process, `replicas` can be simplified from `std::vector<std::vector<at::Tensor>>` to `std::vector<at::Tensor>` in the Reducer class. Test Plan: All tests are passing `pytest test/distributed/test_c10d_gloo.py -vs` Imported from OSS Reviewed By: mrshenli Differential Revision: D30615965 fbshipit-source-id: d2ec809d99b788c200b01411333e7dbad1269b51	2021-09-21 16:13:18 -07:00
Keyrat06	27faa7a560	[ONNX] Support torch.isfinite export (#64759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/issues/64754 1. onnx::IsInf is introduced in opset 10, onnx:isnan is introduced in opset 9 -> isfinite = not(or(isinf,isnan)) -> opset 10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64759 Test Plan: Imported from OSS Reviewed By: seemethere, bdhirsh Differential Revision: D31060760 Pulled By: malfet fbshipit-source-id: 499ecd6cc55ea881b8a57e6a9a4fb38eaaee5242	2021-09-21 15:47:48 -07:00
Jane Xu	5aa33770f5	.circleci: Remove Windows workflows from Circle (#64959 ) Summary: Removes Windows CI from Circle Will go in after https://github.com/pytorch/pytorch/pull/65094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64959 Reviewed By: soulitzer Differential Revision: D31095374 Pulled By: janeyx99 fbshipit-source-id: b0d13a59aa8c6e2f85dbd9c343cac395c4e64475	2021-09-21 15:32:24 -07:00
Erjia Guan	a1216061c1	[DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220 Fixes #65221 - Remove deepcopy from Mapper to support file handles - Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch) - Convert `IDP` to `IterableWrapper` in test_datapipe.py - Refine the variable names (prevent using `dp` that is module reference) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31021886 Pulled By: ejguan fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223	2021-09-21 14:29:40 -07:00
BowenBao	73c4bfc30a	[ONNX] Add log10 symbolic (#63418 ) (#64374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64374 Fixes #61332 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30919609 Pulled By: msaroufim fbshipit-source-id: f474376bbf7b59677b10565f316384eca59dba43 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-09-21 13:30:59 -07:00
Ivan Yashchuk	1fec9cd76b	[Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D30994115 Pulled By: cpuhrsch fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24	2021-09-21 13:03:40 -07:00
albanD	8bab468943	Reduce test size for max_pool (#65336 ) Summary: Fixe OOM in slow gradcheck tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65336 Reviewed By: malfet Differential Revision: D31059007 Pulled By: albanD fbshipit-source-id: 2dd6967d88663558e37f8c0836ad33333c92dfb5	2021-09-21 12:57:02 -07:00
Emilio Castillo	cd813f16bf	Add functional api for `nn.Module` (#61447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58839 After discussing with albanD he proposed this simple design. Let's iterate over the idea here :). Thanks. The main point that this PR does is to use reparametrization to be reverted at the end of the functional call. This allows us to have the original model with its status unchanged, also in this scenario the module is created without parameters so this will hard error if not all parameters are specified when the forward pass is done. ``` python import torch import torch.nn.utils._stateless class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.l1 = torch.nn.Linear(1, 1) def forward(self, x): return self.l1(x) mod = MyModule() print('weight before', mod.l1.weight) x = torch.rand((1, 1)) parameters = {"l1.weight": torch.nn.Parameter(torch.tensor([[1.0]])), "l1.bias": torch.nn.Parameter(torch.tensor([0.0]))} res = torch.nn.utils._stateless.functional_call(mod, parameters, x) print('Functional call input ', x, ' and result ', res) print('weight after', mod.l1.weight) ``` Output ``` weight before Parameter containing: tensor([[-0.4419]], requires_grad=True) Functional call input tensor([[0.3531]]) and result tensor([[0.3531]], grad_fn=<AddmmBackward>) weight after Parameter containing: tensor([[-0.4419]], requires_grad=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61447 Reviewed By: soulitzer Differential Revision: D31082765 Pulled By: albanD fbshipit-source-id: ba814d0f9162fb39c59989ca9a8efe160405ba76	2021-09-21 12:39:43 -07:00
Pritam Damania	c245632e2e	Use higher timeout for TSAN tests. (#65391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65391 TSAN tests are much slower than the usual dev/opt mode, about 5-10x slower. As a result, for TSAN build mode we use a much higher timeout for distributed tests. ghstack-source-id: 138584613 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D31076575 fbshipit-source-id: 44a485f07101deac536470ceeff2a52cac4f9e0b	2021-09-21 12:08:27 -07:00
Kushashwa Ravi Shrimali	28bfdbb066	OpInfo for `nn.functional.batch_norm` (#63218 ) Summary: Addresses https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. * There exists `torch.batch_norm` but it takes an extra arg: `cudnn_enabled` (not there in functional variant). This is passed from the functional variant to `torch.batch_norm` here: https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L2282. `test_variant_consistency_jit` fails with an error: (when passed an alias) ```python File "/home/krshrimali/Documents/Projects/Quansight/pytorch/test/test_ops.py", line 457, in _test_consistency_helper variant_forward = variant(cloned, TypeError: batch_norm() missing 1 required positional arguments: "cudnn_enabled" ``` * I'm not sure of a solution to this, as AFIK - there is no way to pass a lambda wrapper for an alias. Hence, I've skipped adding this as an alias there. * On second thought, is this even an alias? cc: mruberry zou3519 kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63218 Reviewed By: bdhirsh Differential Revision: D31019785 Pulled By: zou3519 fbshipit-source-id: 2a834d05835da975289efc544a7ad7e98c99438f	2021-09-21 11:35:34 -07:00
Jane Xu	9afdf017dc	Add force_on_cpu test to win cuda10.2 on GHA (#65094 ) Summary: Part of migrating from Circle. Once we get a successful force_on_cpu test, we can move it to trunk only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65094 Reviewed By: seemethere Differential Revision: D31086289 Pulled By: janeyx99 fbshipit-source-id: e1d135cc844d51f0b243b40efb49edca277d9de8	2021-09-21 11:14:15 -07:00
Rodrigo Berriel	00b732e98b	Remove orphan from cuDNN persistent note (#65160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60009. As the document is properly [included](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L799), and [`:orphan:` doesn't need to be used in included documents](https://github.com/sphinx-doc/sphinx/issues/6787#issuecomment-549256840), and no warning is emitted in my local build when removing it, I think it can be removed. The artifact reported in https://github.com/pytorch/pytorch/issues/60009 can be seen in 3 pages: [torch.nn.RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN), [torch.nn.LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM), and [torch.nn.GRU](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU). cc ezyang suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/65160 Reviewed By: bdhirsh Differential Revision: D31020280 Pulled By: ezyang fbshipit-source-id: 6c3541e5a856a91cf1ce1d2db4d04f5d13118ee4	2021-09-21 11:09:47 -07:00
Scott Wolchok	c0eb266c02	[Static runtime] Micro-optimization pass on GetLivenessMap (#65175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65175 More efficient use of map API, more efficient way to insert all pairs of inputs/outputs in liveness map ghstack-source-id: 138547815 Test Plan: Time to enable static runtime down from ~8.7s to ~8.4s Reviewed By: mikeiovine Differential Revision: D30983897 fbshipit-source-id: fa6000bfd0fa0adfcd7c5922199ee32ada8c430e	2021-09-21 10:52:08 -07:00
Edward Yang	6d7bc34b67	Make new_empty/new_ones/new_zeros/new_full respect subclass (#65169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65169 Previously these composite functions created a new tensor using at::empty (or some other factory function) using TensorOptions which doesn't preserve Python subclass. Making new_empty a non-composite op and then routing everyone through it makes it respect subclass. We could also make all of these non-composite but this reduces the number of derivatives.yaml entries I have to make and allows you to trace the fill calls. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31003713 Pulled By: ezyang fbshipit-source-id: 19f906f1404a6b724769c49f48d123f407a561ff	2021-09-21 10:50:48 -07:00
Scott Wolchok	04a5e45aeb	[PyTorch] Compare Type pointers before calling operator== in EqualNode (#65352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65352 This can be a big win if it saves the virtual call to operator== and the cost is tiny. ghstack-source-id: 138497657 Test Plan: Profiled ptvsc2_predictor_bench startup, inclusive time spent in EqualNode::operator() dropped from 0.8% to negligible Reviewed By: hlu1 Differential Revision: D30974969 fbshipit-source-id: 9c3af36cffe709dfce477dcc49722536470264a0	2021-09-21 10:46:24 -07:00
Edward Yang	88232b4cee	Fix ENABLE_RECORD_KERNEL_FUNCTION_DTYPE build (#65370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65370 Forgot a wrapping 'namespace at' here! And no contbuilds to test it. ghstack-source-id: 138565579 Test Plan: ``` buck build --show-output -c pt.disable_per_op_profiling=0 -c pt.enable_record_kernel_dtype=1 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/model_tracer:model_tracer ``` Reviewed By: JacobSzwejbka Differential Revision: D31065923 fbshipit-source-id: ed4563fbd8f3c29f6b10ac8999c9010bd4359c97	2021-09-21 10:42:33 -07:00
Natalia Gimelshein	eb4fb1ed81	THCTensor cleanup (#65369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65369 Reviewed By: bhosmer Differential Revision: D31071406 Pulled By: ngimel fbshipit-source-id: bbc3f2781003333641524aeb692b944fd3ad8d7a	2021-09-21 10:28:19 -07:00
Xing Liu	600df80296	[PT/ShardedTensor]Allow zero size local shard (#65007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007 Relax shard size check in ShardMetadata to allow zero size local shard. When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata. Test Plan: Unit tests and CLI Reviewed By: jiaqizhai, wanchaol Differential Revision: D30926566 fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36	2021-09-21 09:58:54 -07:00
kshitij12345	7f6580a868	OpInfo: nn.functional.conv2d (#65233 ) Summary: Reland : https://github.com/pytorch/pytorch/issues/63517 Reference: https://github.com/pytorch/pytorch/issues/54261 Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65233 Reviewed By: malfet Differential Revision: D31025538 Pulled By: zou3519 fbshipit-source-id: b1cd38c22f4cb8eedd3f958e02dd7410dcbb8d8d	2021-09-21 09:26:23 -07:00
Mike Iovine	9324181d0a	[JIT] Re-land "Add aten::slice optimization" (#65341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65341 The changes in D30231044 (`babd449978`) were removed due to a downstream issue in glow. Now that the issue has been fixed by D30849396, we can safely re-introduce the changes. Test Plan: `buck test //caffe2/test:jit -- TestPeephole` Glow test: * `buck test //glow/fb/torch_glow/tests:unfuse_glow_ops_test` * qxy11 confirmed that the problematic glow model now loads correctly with these changes Reviewed By: eellison Differential Revision: D31056878 fbshipit-source-id: 049903ee04ba88885cc9d1a91427af0f1f44f681	2021-09-21 07:29:51 -07:00
kshitij12345	9c23f6eb7d	[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882 Reviewed By: malfet Differential Revision: D31055577 Pulled By: jbschlosser fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf	2021-09-21 07:29:48 -07:00
Teng Gao	d35ee431d8	correlate forward and backward op (#62553 ) Summary: Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them. third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553 Reviewed By: malfet Differential Revision: D30125728 Pulled By: gdankel fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e	2021-09-21 07:28:29 -07:00
Rodrigo Berriel	f0ada4bd54	[docs] Remove .data from some docs (#65358 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task: - [ ] Remove the use of `.data` in all our internal code: - [ ] ... - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst` In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`). cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358 Reviewed By: malfet Differential Revision: D31061790 Pulled By: albanD fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032	2021-09-21 06:32:31 -07:00
Benjamin Rowell	daa50f1e9f	Adds keyword only args to gradcheck (#65290 ) Summary: Changes the call signature of gradcheck so that kwargs are kwargs only. Also modifies return call from gradgradcheck, to reflect these changes. Fixes https://github.com/pytorch/pytorch/issues/65165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65290 Reviewed By: soulitzer Differential Revision: D31061316 Pulled By: albanD fbshipit-source-id: 3505569a33a497a8be4347bdd425bb2b8e536999	2021-09-21 06:31:07 -07:00
Chen Lai	880098a7e3	[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651 ) Summary: 1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4. 2. Bump bytecode version from 6 to 7 3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators. 4. unittest to cover backport function 5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651 ghstack-source-id: 138539912 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions ``` Reviewed By: raziel, tugsbayasgalan Differential Revision: D30454080 fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307	2021-09-20 22:50:30 -07:00
Pavel Belevich	5826d207ad	[JIT] Delete obsolete message: or if you absolutely have to, use c10::impl::GenericDict(c10::impl::deprecatedUntypedDict()) (#65164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65164 Looks like it was forgotten in https://github.com/pytorch/pytorch/pull/25439 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31072625 Pulled By: pbelevich fbshipit-source-id: a5ffcfb0836f962ab6952a187ba7717c4d4a6e33	2021-09-20 22:50:28 -07:00
Pavel Belevich	19a1063888	[JIT] Support device as Dict key (#65079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65079 This is required to use RPC DeviceMap aka Dict[torch.device, torch.device] in torchscript Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31072626 Pulled By: pbelevich fbshipit-source-id: 51cfa5653db86de73b624e9157d68d1b319bfc64	2021-09-20 22:49:15 -07:00
Amr Elshennawy	512834b61d	Reduce PyTorch warnings: Cast fix xplat/caffe2/aten/src/ATen/core/DeprecatedTypeProperties.h (#65031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65031 Test Plan: ``` buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: r-barnes Differential Revision: D30948791 fbshipit-source-id: 13046e1d0ce2c24864ad38f318ca5e34b1bb9552	2021-09-20 20:29:58 -07:00
Pritam Damania	0dc98728bc	Basic implementation of ShardedLinear using ShardedTensor. (#64128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128 This PR implements a sharded nn.Linear layer using ShardedTensors with the following limitations: 1) Works only for ChunkShardingSpec. 2) Implementation is only aimed to demonstrate functionality and is most likely not performant at all. The PR also introduces a `shard_parameter` API to easily shard parameters of `nn.Modules`. This also has the following limitations: 1) Works only for ChunkShardingSpec. 2) Is not performant since it uses broadcast instead of scatter since ProcessGroupNCCL doesn't yet support scatter. Overall user API for running a sharded linear would be something like this: ``` # SPMD programming paradigm running same code on all nodes. fc = nn.Linear(10, 10) # Setup sharding. sharding_spec=ChunkShardingSpec(...) shard_parameter(fc, 'weight', sharding_spec, src_rank=0) # Run as a normal linear layer. inp = torch.rand(10, 10) output = fc(inp) ``` ghstack-source-id: 138500985 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: wanchaol, bowangbj Differential Revision: D30621215 fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6	2021-09-20 18:31:11 -07:00
driazati	257a18d951	Track peak memory usage (#65157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65157 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31029049 Pulled By: driazati fbshipit-source-id: 3e87e94e4872d118ad191aef2b77b8cefe90aeb6	2021-09-20 17:25:16 -07:00
driazati	58909395ab	Fix logic to determine master vs PR (#65155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65155 This was bugged before on empty strings which caused the hook to write on any job, not just `master` regardless of the `only_on_master` flag. Test Plan: see `[scribe] Skipping RDS write on PR` in the logs for `linux-xenial-cuda11.3-py3.6-gcc7` Reviewed By: malfet Differential Revision: D31029048 Pulled By: driazati fbshipit-source-id: 77c4a60e443d8fc19990755a3a346576afee86d8	2021-09-20 17:25:14 -07:00
Ben Koopman	60915eb810	[quant] Add fp32/fp16 zero_point support for CPU fakeQuant (#65055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65055 Test Plan: Imported from OSS Reviewed By: jingsh, supriyar Differential Revision: D30975238 Pulled By: b-koopman fbshipit-source-id: 2000660ffe71cb85d00fdabaf8fc3ba7323f9a1e	2021-09-20 17:25:12 -07:00
Hao Lu	ce101fed02	[PyPer] copy-free freeze_module (#65118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118 Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase. Reviewed By: eellison, movefast1990 Differential Revision: D30955053 fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0	2021-09-20 17:25:10 -07:00
Amr Elshennawy	ca649851c6	Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030 Test Plan: ``` buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: r-barnes Differential Revision: D30948721 fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1	2021-09-20 17:23:58 -07:00
Tao Xu	2465a103b8	[iOS] Zero out NSError to avoid heap corruptions for the OSS builds (#65355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355 I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`. However, I haven't see this issue for the BUCK builds. ghstack-source-id: 138502708 Test Plan: 1. Test the OSS builds to make sure the heap corruption has gone. 2. Test the Buck build in the playground app 3. Circle CI Reviewed By: hanton Differential Revision: D31048010 fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481	2021-09-20 16:31:23 -07:00
=	b7adb3350a	Add crow_/col_indices to view types (#63176 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61103 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63176 Reviewed By: malfet, albanD Differential Revision: D30315882 Pulled By: cpuhrsch fbshipit-source-id: eedae5265a757ed68fd69e4f9d07070b05de4bd8	2021-09-20 14:35:58 -07:00
Protonu Basu	31f61122da	Creating a helper function to generate an unique name for an attr in a module (#64970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64970 Add a helper function to create an unique name for an attr. This can be used when we want to add a weight to a module. Test Plan: run CI. Reviewed By: jfix71 Differential Revision: D30921497 fbshipit-source-id: 598569d107df8b516ff12920a4bef3a42577e987	2021-09-20 14:35:56 -07:00
Protonu Basu	b45ec16310	Add support to lower acc_ops.transpose (#65036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65036 Reviewed By: jfix71, 842974287 Differential Revision: D30934503 fbshipit-source-id: 51880d3d36492f5206f77c9d1a994d8532597b62	2021-09-20 14:35:54 -07:00
Shiyan Deng	e33a1fa680	[fx] give warning instead of fatal the program when submod not found during adding get_attr (#65225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65225 Currently when create get_attr node, if the attribute is in a submodule, we'll fist find the submodule. If the submodule isn't in the owning module we throw an exception. However, if the attribute can't be found, we give a warning but still allow to create the get_attr node. To align with this behavior, we change the reaction when submod not found from fatal to giving a warning. Test Plan: CI Reviewed By: jamesr66a, jfix71 Differential Revision: D31021535 fbshipit-source-id: 4c0b471448c09cc927d0f47b5bf56594f25a8863	2021-09-20 14:35:52 -07:00
Can Balioglu	8fb253757d	Remove @balioglu from PyTorch Distributed code owners (#65239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65239 Due to too much noise caused by the GitHub notifications, going forward I prefer to track PRs manually. ghstack-source-id: 138386041 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D31027792 fbshipit-source-id: 6578e41d4ab53ad2c64a41584716f4903298cd6b	2021-09-20 14:34:37 -07:00
Michael Carilli	e3210ca184	[CUDA graphs] Beta, not prototype (#65247 ) Summary: Powers have decided this API should be listed as beta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247 Reviewed By: malfet Differential Revision: D31057940 Pulled By: ngimel fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa	2021-09-20 13:32:36 -07:00
Rodrigo Berriel	b71f01f70d	Fix full backward hook when grad is disabled (#65335 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59901. See discussion in the issue. cc albanD soulitzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/65335 Reviewed By: malfet Differential Revision: D31055865 Pulled By: albanD fbshipit-source-id: 53605df62bc73c99d8908248087ab400b81ac495	2021-09-20 13:31:19 -07:00
zhouzhuojie	2abf3594d5	Fix unassigned ciflow trigger (#65354 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65250#issuecomment-923120764 this is a limitation of github action triggers, it's hard to introduce condition before the workflow, that's why we intentionally pick the rare event ("unassigned"). The fix I think for people didn't opt-in ciflow and manually unassign, is to run all the CI (otherwise we introduce new condition on this and not worth to make things even complex). `unassigned` event payload looks like this, just to make sure `github.event.assignee.login` is pointing to the right location. ``` { "action": "unassigned", "assignee": { "avatar_url": "https://avatars.githubusercontent.com/u/658840?v=4", "events_url": "https://api.github.com/users/zhouzhuojie/events{/privacy}", "followers_url": "https://api.github.com/users/zhouzhuojie/followers", "following_url": "https://api.github.com/users/zhouzhuojie/following{/other_user}", "gists_url": "https://api.github.com/users/zhouzhuojie/gists{/gist_id}", "gravatar_id": "", "html_url": "https://github.com/zhouzhuojie", "id": 658840, "login": "zhouzhuojie", "node_id": "MDQ6VXNlcjY1ODg0MA==", "organizations_url": "https://api.github.com/users/zhouzhuojie/orgs", "received_events_url": "https://api.github.com/users/zhouzhuojie/received_events", "repos_url": "https://api.github.com/users/zhouzhuojie/repos", "site_admin": false, "starred_url": "https://api.github.com/users/zhouzhuojie/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/zhouzhuojie/subscriptions", "type": "User", "url": "https://api.github.com/users/zhouzhuojie" }, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65354 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D31060212 Pulled By: zhouzhuojie fbshipit-source-id: ce815cc96e8a00016646d6f02f0917169fa652dc	2021-09-20 12:33:23 -07:00
Alban Desmaison	378949b83c	fix typo missing f string (#65226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65226 Reviewed By: malfet Differential Revision: D31055793 Pulled By: albanD fbshipit-source-id: fafac53e75223c4f599bd2162095aacad7b690df	2021-09-20 12:31:54 -07:00
Tao Xu	0430d1da12	[iOS] Fix the TestApp (#65319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65319 Test Plan: Imported from OSS Reviewed By: hanton Differential Revision: D31049543 Pulled By: xta0 fbshipit-source-id: ff0d0baac30682c63b2a28254ee0a5d8d9b8ca6f	2021-09-20 11:28:40 -07:00
Pritam Damania	3e64c9e176	[Pipe] Add a `WithDevice` wrapper to specify device execution for a module. (#65190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65190 As described in https://github.com/pytorch/pytorch/issues/65093, there could be modules which don't have any parameters/buffers. In this case, Pipe determines that the module should be executed on CPU. However this might result in unnecessary GPU to CPU transfers whereas the user expected the module to be executed on the GPU itself by keeping its inputs and outputs on GPU. For this use case, we introduce a `WithDevice` wrapper which can be used to override which device a particular module should be executed on as part of the pipeline. #Closes: https://github.com/pytorch/pytorch/issues/65093 ghstack-source-id: 138376272 Test Plan: 1) waitforbuildbot 2) unit tests Reviewed By: SciPioneer Differential Revision: D31010027 fbshipit-source-id: 4c1c61d3c6feeef341e002e5f7e83dd33ff3a516	2021-09-20 11:27:27 -07:00
Nicolas Hug	0a3cf8886a	Torchhub: More robust assumption regarding main or master branch (#64364 ) Summary: Closes https://github.com/pytorch/pytorch/issues/63753 This PR changes the assumption regarding the default branch of a repo to the following: > If main exist then use main,otherwise use master This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch. cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364 Reviewed By: saketh-are Differential Revision: D30731551 Pulled By: NicolasHug fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b	2021-09-20 10:36:13 -07:00
Mike Iovine	99e4ab5d44	[Static Runtime] Implement and enable variadic tuple unpack (#64934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934 Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from: ``` %0, %1 = prim::TupleUnpack(%a) %2, %3 = prim::TupleUnpack(%b) ``` into: ``` %0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b) ``` The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about. Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack` Reviewed By: d1jang Differential Revision: D30872109 fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9	2021-09-20 10:36:11 -07:00
Jerry Zhang	14347d0dd5	[quant][fx][graphmode] Fix a bug for sub (#65109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109 Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler, however this is incorrect, the dtype for sub is decided by whether the output is quantized or not, so we added a check of is_output_quantized while deciding the dtype for the output of sub. Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future. Test Plan: python test/test_quantization.py TestQuantizeFx.test_sub_scalar Imported from OSS Reviewed By: vkuzo Differential Revision: D30977826 fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a	2021-09-20 10:36:09 -07:00
Natalia Gimelshein	c562ebca23	Revert "Revert D30558877: Ported std/var to ReductionOpInfo (#65262 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/63978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65262 Reviewed By: mruberry Differential Revision: D31037360 Pulled By: ngimel fbshipit-source-id: 1c60f40c547229767cba3bbe7e11ca0fbbc8f95f	2021-09-20 10:36:06 -07:00
Michael Dagitses	fb1e6835cc	simplify `torch.meshgrid`'s shape computation (#62905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62905 Reviewed By: mruberry Differential Revision: D31021274 Pulled By: dagitses fbshipit-source-id: c219389bdc543e9592f7b1c707acfbf752ee6f34	2021-09-20 10:34:45 -07:00
Erjia Guan	cf60d24028	[DataPipe] Unlimited buffer for Forker and Demultiplexer (#64994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64994 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30934362 Pulled By: ejguan fbshipit-source-id: d3b774d7e28c0b9659e999511e5a68c3929857d4	2021-09-20 09:30:39 -07:00
Facebook Community Bot	88032d8943	Automated submodule update: FBGEMM (#64640 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `d1ecc7dbe2` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64640 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30805660 fbshipit-source-id: 9f783862e89fe3974badd5194ef793db55e7d275	2021-09-18 16:29:30 -07:00
Jerry Zhang	d8189db80f	[quant][fx2trt] Generate engine graph for explicit quant/implicit quant and fp16 graph (#65289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65289 Turn on VERBOSE logging and use engine visualizer to generate the graph. Runtime: ``` explicit quant result diff max tensor(0.0771) implicit quant result diff max tensor(0.1909) trt fp16 time (ms/iter) 1.0740923881530762 trt int8 time (ms/iter) 0.5288887023925781 trt implicit int8 time (ms/iter) 0.6334662437438965 PyTorch time (CUDA) (ms/iter) 4.448361396789551 PyTorch time (CPU) (ms/iter) 45.13296604156494 ``` Generated Graphs: ``` explicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669571 implicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669656 fp16: https://www.internalfb.com/intern/graphviz/?paste=P458669708 ``` Test Plan: ``` buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test 2>log buck run //deeplearning/trt/fx2trt/tools:engine_layer_visualize -- --log_file log ``` Reviewed By: 842974287 Differential Revision: D30955035 fbshipit-source-id: 24949458ad9823fb026d56d78a6ee1c6874b6034	2021-09-18 13:30:37 -07:00
Don Jang	7f8d622d70	[Static Runtime] Add perf metrics for number of managed tensors & unmanaged values (#64992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64992 This change lets Static Runtime print out number of managed tensors & unmanaged values as performance metrics during profile runs. We will use /enhance these metrics to guide the effort of managing output tensors. Test Plan: Confirmed that a profile run prints out the added metric values on inline_cvr nets: ``` (inline_cvr/local) ... Total number of managed tensors: 2754 Total number of unmanaged values: 3240 ... (inline_cvr/local_ro) Total number of managed tensors: 1554 Total number of unmanaged values: 2966 ... (inline_cvr/remote_ro) Total number of managed tensors: 1439 Total number of unmanaged values: 28 ... ``` Reviewed By: hlu1 Differential Revision: D30926617 fbshipit-source-id: b86e071003ac941b9663db103eaa7c614466b4e0	2021-09-18 11:26:37 -07:00
Saketh Are	4a128ed811	Remove incorrect stride assert in Reduce.cuh (#65227 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37583 Per discussion with ngimel, the condition asserted here may not always hold after TensorIterator's dimension coalescing and reordering. However, the reduction output should still be correct when `sub_iter.strides(0)[0]` is non-zero. I've verified correctness empirically by: 1. Lowering the threshold ([configured here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/TensorIterator.cpp#L1127)) at which iterators are split into sub-iterators, making it easier to trigger. 2. Generating many tensors with random dimensions and randint elements which produce a non-zero `sub_iter.strides(0)[0]` in the CUDA kernel. 3. Verifying that the reduction `t.sum(dim=0)` produces the same results for those tensors on CPU and on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65227 Reviewed By: ngimel Differential Revision: D31031406 Pulled By: saketh-are fbshipit-source-id: 5cbf2001224454c74f6db42455c507365ad1f2b1	2021-09-18 10:29:13 -07:00
Michael Dagitses	543185a0fd	support using gradients named for outputs in derivatives (#63947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947 Fixes #62196 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30541485 Pulled By: dagitses fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87	2021-09-18 07:31:45 -07:00
Michael Dagitses	926a3d2e85	clarify implementation of check_grad_usage (#64439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64439 1) remove unused fully_implemented 2) rename used_grad to uses_grad and make it a boolean 3) rename used_grads to num_grads_uses 4) add comments explaining what some of the checks mean Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30733904 Pulled By: dagitses fbshipit-source-id: dccbbef8a4be8713215ef91aa97a34124f06a7a1	2021-09-18 07:30:30 -07:00
Jerry Zhang	d3e36fade2	[quant][fx2trt] Enable comparison with implicit quant mode (#65043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65043 Currently got following result, will take a look at the executed graph again: ``` trt fp16 time (ms/iter) 1.0483217239379883 trt int8 time (ms/iter) 0.5329632759094238 trt implicit int8 time (ms/iter) 0.6769704818725586 PyTorch time (ms/iter) 6.453146934509277 ``` Test Plan: ``` python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py ``` Imported from OSS Reviewed By: 842974287 Differential Revision: D30954871 fbshipit-source-id: 8d7ff82b8f5d0b7946fbd38a7cddede7d40b28aa	2021-09-17 23:29:35 -07:00
CodemodService Bot	4150b672aa	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31039372 fbshipit-source-id: a5e54a9b1d2ef97e9bc206b9e2a82124e5a22a7a	2021-09-17 20:33:12 -07:00
Jane Xu	6707dfeefb	Remove 9.2 related macros for CONSTEXPR (#65066 ) Summary: Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066 Reviewed By: driazati Differential Revision: D31022520 Pulled By: janeyx99 fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729	2021-09-17 17:31:20 -07:00
zhouzhuojie	1cd9018b6f	Make github.com in noproxy list (#65256 ) Summary: Fixes #{issue number} Attempt to solve some ratelimiting issue we saw from calling GitHub apis Pull Request resolved: https://github.com/pytorch/pytorch/pull/65256 Reviewed By: seemethere Differential Revision: D31035115 Pulled By: zhouzhuojie fbshipit-source-id: 7efd5d5af7d91805e4bf27b86847791e991b741e	2021-09-17 17:31:18 -07:00
Natalia Gimelshein	50c29fef3e	remove utils.cpp (#65184 ) Summary: Dead code Pull Request resolved: https://github.com/pytorch/pytorch/pull/65184 Reviewed By: mruberry Differential Revision: D31031777 Pulled By: ngimel fbshipit-source-id: 13633888229a7af8cfd8ea7e55ea2880b2e47273	2021-09-17 17:31:15 -07:00
Shiyan Deng	19471c54a6	[fx const fold] fix a case when some inputs are unused (#65223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65223 If there're unused inputs, they won't appear in `submod_1`. We need to add all the unused inputs so that the model after const fold has the same inputs as the original model. Reviewed By: jfix71 Differential Revision: D31021217 fbshipit-source-id: b7452c90d133b747e0699936a81d3fee14af9cc9	2021-09-17 17:29:55 -07:00
Gisle Dankel	992dad1855	[Profiler] Update kineto submodule (#65236 ) Summary: Update to latest kineto revision. See Kineto repo for change log. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65236 Reviewed By: malfet Differential Revision: D31031638 Pulled By: gdankel fbshipit-source-id: 681655b2e8e151895afa91445ced0fd57a11fa93	2021-09-17 16:26:30 -07:00
Shiyan Deng	4408b755bc	[fx2trt] re-enable profiler and some miscs for TRTModule (#65072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65072 Previously disabled attaching trt profiler to exec context in TRTModule because https://fburl.com/mc33n880 states that `enqueue()` doesn't support profiling. Seems to be a lie though. Re-enable attaching profiler in this diff. Also added a bunch of checks for dtype and shape, and fixed saving state_dict and loading back. Test Plan: buck run mode/opt -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test Reviewed By: yinghai Differential Revision: D30962757 fbshipit-source-id: 9c664b0500a8169b7952f6f912239a5a05772aea	2021-09-17 16:26:28 -07:00
Michael Suo	afa25c77f1	[package] Make it possible to re-save a PackageImporter module (#65101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101 As title. Previously this was guarded against for implementation simplicity, as we didn't really think there was a use case for saving a mangled module name directly. But people started doing stuff like: ``` exporter.save_module(my_imported_obj.__module__) ``` which implicitly passes along the mangled module name. This PR makes it so that given `PackageImporter` instance can always import modules that it created, and changes `PackageExporter` to properly demangle the resulting module name when writing the package to the export archive. Differential Revision: D30975712 D30975712 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149	2021-09-17 16:25:11 -07:00
Jason Ansel	487c771593	[FX] Fix tracing of bitwise and/or (#65196 ) Summary: Previously resulted in `AttributeError: module 'operator' has no attribute 'and'` and/or are python keywords, so they are renamed to `operator.and_` and `operator.or_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65196 Reviewed By: Chillee Differential Revision: D31020336 Pulled By: jansel fbshipit-source-id: 51d888151fe78c0c1197ecaf161976b219c59694	2021-09-17 14:33:02 -07:00
Mike Ruberry	6596173811	Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits Test Plan: revert-hammer Differential Revision: D30731191 (`f9bf144a0c`) Original commit changeset: d1ee7c2ef259 fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c	2021-09-17 14:33:00 -07:00
BowenBao	3d32dec5ba	[ONNX] Deprecate enable_onnx_checker argument in torch.onnx.export() (#61708 ) (#64369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64369 As of now, the "enable_onnx_checker" parameter was described as below: enable_onnx_checker (bool, default True): If True the ONNX model checker will be run to ensure the exported model is a valid ONNX model. An invalid ONNX graph is useless to users so such checker should be done for each call. In this PR, we will still write the model to an ONNX file even it is invalid. And the exception will be thrown after the ONNX file has been created. This enables user output an invalid ONNX graph for debug. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as enable_onnx_checker is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905267 Pulled By: malfet fbshipit-source-id: 3ad3f68e77fcec012cc7ef674cc9a61755eebc9e Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-17 14:31:41 -07:00
Don Jang	ae00075ac7	[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123 This change re-reverts D30883290 (`0e11454d19`). D30883290 (`0e11454d19`) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`. ``` ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 return torch::jit::StaticRuntime(*smod); Sep 15 15:39:57 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 auto sr = getStaticRuntime(); Sep 15 15:39:57 ^ ~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 2 errors generated. ``` This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine). Original Summary: This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: - Confirm that OSS build went well (See External Tests section). Reviewed By: mikeiovine Differential Revision: D30983292 fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116	2021-09-17 13:32:01 -07:00
Mengwei Liu	eaf85fad62	[PyTorch] Extract parseOperator() into a standalone source file (#65179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179 This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build. Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`. Reviewed By: iseeyuan Differential Revision: D31006555 fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac	2021-09-17 13:31:59 -07:00
Scott Wolchok	35084ee451	[PyTorch] Improve OperatorEntry::getKernelForDispatchKey (#64838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64838 The returned pointer, if present, could never be nullptr, so there is no reason to wrap it in an optional rather than just using the nullptr state. The repeated calls to kernels_.at() were not getting optimized away, so just use the perfectly good iterator find() already gave us. ghstack-source-id: 138304429 Test Plan: CI Reviewed By: bdhirsh Differential Revision: D30875748 fbshipit-source-id: 9cbb875715b7a582380c7402155fdbe21944dc85	2021-09-17 13:31:56 -07:00
Scott Wolchok	fcaf526815	avoid moving Argument in infer_schema (#64822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64822 Turns out the suppressed lint message was trying to tell us something: we can construct our Argument in-place rather than create a temporary and move into the argument vector. ghstack-source-id: 138304423 Test Plan: CI, profile op registration and observe reduced Argument move ctor and dtor costs Reviewed By: smessmer Differential Revision: D30860718 fbshipit-source-id: c8da45ab7e61b5df9fa1273301896309bca108b5	2021-09-17 13:31:54 -07:00
Scott Wolchok	79cbcd3e7c	[PyTorch] Fix missing move in Argument ctor (#64821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64821 Not moving adds excess refcounting overhead. ghstack-source-id: 138304432 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30860720 fbshipit-source-id: de695e5cdfb1fa314b53a8bcb291343ae4eb87a5	2021-09-17 13:31:51 -07:00
Scott Wolchok	5a3475df21	[PyTorch] shrink Argument (#64820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64820 Putting boolean fields next to each other avoids wasting space for padding. ghstack-source-id: 138304433 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30860717 fbshipit-source-id: ad45c37574a7c857958978aad42fd1333c6b29ee	2021-09-17 13:31:48 -07:00
Scott Wolchok	132d65ed25	[PyTorch] Compare pointers before calling expensive Type comparison (#64784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64784 See code comment for explanation. ghstack-source-id: 138304431 Test Plan: Reduced overhead in findSchemaDifferences while profiling registration at startup in a case where I forced duplicates to be registered (by looping in RegisterDispatchKey.cpp). Reviewed By: dhruvbird Differential Revision: D30854036 fbshipit-source-id: 568733c3cf449697cdeb74cf57fed0926729fa68	2021-09-17 13:31:46 -07:00
Jane Xu	cf5c00f155	CI: Consolidate Build and Test naming for better stats collection (#65232 ) Summary: All build pytorch steps should now be named "Build" and test steps named "Test" for workflows that test PyTorch on Linux and Windows. I left the binary stuff alone as that build is different. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65232 Reviewed By: driazati, seemethere Differential Revision: D31024232 Pulled By: janeyx99 fbshipit-source-id: 24b1a1e2b1b25aba70b7adc41603ec8fa4ce7dd6	2021-09-17 13:30:31 -07:00
Rohan Varma	45bd0f6181	Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778 Original commit changeset: d3f3fb813c45 ghstack-source-id: 138326910 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849443 fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02	2021-09-17 12:28:36 -07:00
Rohan Varma	70f286c1e2	Back out "Revert D30745961: [DDP] Remove self.modules_params" (#64777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64777 Original commit changeset: 59f7cc50d369 ghstack-source-id: 138326909 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849442 fbshipit-source-id: bb87ba83935374d8a3ebbc29365df1417dd4f26f	2021-09-17 12:28:34 -07:00
Rohan Varma	61dfcbf4bc	Back out "Revert D30745921: [DDP] Fix when buffers are reassigned in module" (#64776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64776 Original commit changeset: 343ead86bf1e ghstack-source-id: 138326914 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849444 fbshipit-source-id: 9a72805416fe7d6c68e51bdcdb88f6e1fecb614d	2021-09-17 12:28:32 -07:00
Sangbaek Park	cce5381238	[xplat][pytorch]: Disabling too many logging. (#65170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65170 Disabling too many logging. These are per frame logging and outputting lots of logs in Skylight command line. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: SS-JIA Differential Revision: D30778852 fbshipit-source-id: bcf75ec417dfe3e9ce3df92a1894352772bd663d	2021-09-17 12:28:30 -07:00
Michael Dagitses	047e68235f	delegate parallelism to Ninja when possible (#64733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733 The previous implementation was wrong when CPU scheduling affinity is set. In fact, it is still wrong if Ninja is not being used. When there is CPU scheduling affinity set, the number of processors available on the system likely exceeds the number of processors that are usable to the build. We ought to use `len(os.sched_getaffinity(0))` to determine the effective parallelism. This change is more minimal and instead just delegates to Ninja (which handles this correctly) when it is used. Test Plan: I verified this worked as correctly using Ninja on a 96-core machine with 24 cores available for scheduling by checking: * the cmake command did not specify "-j" * the number of top-level jobs in top/pstree never exceeded 26 (24 + 2) And I verified we get the legacy behavior by specifying USE_NINJA=0 on the build. Reviewed By: jbschlosser, driazati Differential Revision: D30968796 Pulled By: dagitses fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2	2021-09-17 12:28:28 -07:00
Michael Dagitses	b936a10074	add test for number of jobs when building (#65162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65162 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30998006 Pulled By: dagitses fbshipit-source-id: 8b8d45668acf0e6c0f16df0f705a1af8c6d4f22d	2021-09-17 12:28:25 -07:00
Jane Xu	1ee66a5278	Remove CUDA 9.2 references conditionals and workarounds (#65070 ) Summary: Title says it all Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070 Reviewed By: malfet Differential Revision: D30966464 Pulled By: janeyx99 fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8	2021-09-17 12:28:23 -07:00
edward-io	51e12f0071	fix torch.distributed.elastic event docs (#64974 ) Summary: the example code wasn't working for me. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64974 Reviewed By: kiukchung, cbalioglu Differential Revision: D30926481 Pulled By: edward-io fbshipit-source-id: f5e32cc2b948b6ee30d84a8247856f39fc786f67	2021-09-17 12:27:09 -07:00
Raghavan Raman	bbe25af0df	[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30954655 Pulled By: navahgar fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16	2021-09-17 11:28:48 -07:00
Raghavan Raman	03fc636d5c	[nnc] Updated inliner to remove assertions and exception (#64719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30828583 Pulled By: navahgar fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633	2021-09-17 11:28:46 -07:00
Nikita Shulga	340531f2e0	[ONNX] Do not use `numpy` in ONNX opsets (#65188 ) Summary: Replace `torch.tensor([numpy.arange(a, b, c)])` with `torch.arange(a, b, c).unsqueeze(0)` Replace `tuple(numpy.add(a, b))` with `tuple( x + y for (x, y) in zip(a, b)` As `numpy` is an optional dependency, it shouldn't be used in PyTorch core by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/65188 Reviewed By: mruberry Differential Revision: D31009490 Pulled By: malfet fbshipit-source-id: 528e48f055bf9ac1de1fd7e94c0be41915df9a0b	2021-09-17 11:28:44 -07:00
Tao Xu	7ced25eee3	[CoreML][OSS] Include Core ML in iOS/MacOS nightlies (#65075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65075 Need to drop one line at - https://github.com/pytorch/builder/blob/master/conda/pytorch-nightly/meta.yaml#L65 ghstack-source-id: 138324213 Test Plan: - Check the iOS nightly builds - `pod install LibTorch-Lite-Nightly` Reviewed By: hanton Differential Revision: D30912269 fbshipit-source-id: b07679b75ecf89beae2975c37cf17d2449a3304f	2021-09-17 11:27:20 -07:00
Shiyan Deng	f9c0a39ad9	add a test case for const fold (#65224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65224 Add a test case for the fix D30996277 (`8c38d141df`). Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=v100,a100 -c fbcode.enable_gpu_sections=true -j 40 caffe2/test:fx_const_fold -- test_const_fold_module_attr Reviewed By: jfix71 Differential Revision: D31000386 fbshipit-source-id: f444361839decc583bf93ac946cfe2049376719e	2021-09-17 10:32:07 -07:00
Pavithran Ramachandran	3c003aa6ae	[PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816 ## Context: Promoting prim ops: Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change. ## Observation The ops are chosen based on the notebook N1135657 which examines the top frequent ops. ## Fix An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed. ghstack-source-id: 138261338 Test Plan: ``` [pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated Total time: 5.8 sec More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15 Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779) Summary Pass: 2 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115 ``` {F663491347} Reviewed By: iseeyuan Differential Revision: D30819926 fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526	2021-09-17 10:32:05 -07:00
Michael Suo	ecfc784e67	Revert D30993855: [pytorch][PR] OpInfo: nn.functional.conv2d Test Plan: revert-hammer Differential Revision: D30993855 (`873255c6d9`) Original commit changeset: 7402f99addb4 fbshipit-source-id: b0539daa195dc6a3739bce5c264cb2177b7721ff	2021-09-17 10:32:02 -07:00
Tao Xu	18fa58c4e9	[CoreML][OSS] Integrate with CMake (#64523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523 - Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake` - Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh` ghstack-source-id: 138324216 Test Plan: - Test the Helloword example {F657778559} Reviewed By: iseeyuan Differential Revision: D30594041 fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16	2021-09-17 10:32:00 -07:00
Yi Wang	c1415a0a72	[Reland] [Model Averaging] Simplify PostLocalSGD Optimizer API (#65197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197 1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2. The parameters are read from local optimizer's param_groups instead of a separate input. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 138307226 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D31007439 fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722	2021-09-17 10:31:58 -07:00
haozhe.zhu	752a820230	Bf16 matmul (#64619 ) Summary: Re-create PR to fix https://github.com/pytorch/pytorch/pull/61891. Drop the support for addbmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64619 Reviewed By: jbschlosser Differential Revision: D30902995 Pulled By: VitalyFedyunin fbshipit-source-id: dc318d73adff8f6974c9752d0d097e69276f8206	2021-09-17 10:31:56 -07:00
Nicolas Hug	f9bf144a0c	Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362 ) Summary: This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue. It also documents why such validation is important. EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case CC malfet mthrok cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362 Reviewed By: gchanan, malfet Differential Revision: D30731191 Pulled By: NicolasHug fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b	2021-09-17 10:30:39 -07:00
James Reed	0559cb37cd	[FX] Ensure BC coverage for all of torch.fx.passes (#65081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65081 Test Plan: Imported from OSS Reviewed By: jbschlosser, khabinov Differential Revision: D30967428 Pulled By: jamesr66a fbshipit-source-id: 2ff83da728dc469f086cf504e71b43396db612d8	2021-09-17 09:32:43 -07:00
James Reed	cf7409e184	[FX] Move graph_manipulation and param_fetch out of experimental and into passes (#65183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65183 ghstack-source-id: 138309655 Test Plan: waitforsadcastle Reviewed By: protonu Differential Revision: D31007630 fbshipit-source-id: 77d14b284737aabbe2b9e6394177a0c2e40aafba	2021-09-17 09:32:40 -07:00
Shiyan Deng	6aa04b6843	[fx2trt] make gpu trace better (#65168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65168 Add record_function to TRTModule and EngineHolder so each parts would appear on gpu trace. Test Plan: CI Reviewed By: wushirong Differential Revision: D30997968 fbshipit-source-id: b90662f20a8c0d321846c222f3e8c8eb7e010eba	2021-09-17 09:32:37 -07:00
Tao Xu	a8d7b885c5	[CoreML][iOS/MacOS] Add the CoreML executor (#64522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522 The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime. ghstack-source-id: 138324217 allow-large-files Test Plan: iOS: Run the CoreML tests in the playground app MacOS: ``` buck test pp-macos PASS 633ms 1 Passed 0 Skipped 0 Failed CoreMLTests ``` {F657776101} Reviewed By: raziel, iseeyuan Differential Revision: D30594042 fbshipit-source-id: a42a5307a24c2f364333829f3a84f7b9a51e1b3e	2021-09-17 09:32:34 -07:00
Elias Ellison	aafeea3a6c	Allow extra unused arguments in symbolic shape function (#65095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65095 The reason I didn't do this initially was because I was worried that matching one schema to another schema with an extra argument might change semantics, e.g. Add(Tensor, Tensor) to Add(Tensor, Tensor, Tensor) might be different. However we don't actually need to worry about this because the graph schema isn't used for node matching, unlike symbolic_script.cpp Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30972081 Pulled By: eellison fbshipit-source-id: d4089e8feafc330df2ca158866fe779a7da0b073	2021-09-17 09:31:02 -07:00
albanD	6eafe7f15e	Actually deprecate __torch_function__ as plain methods (#64843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843 Fix for https://github.com/pytorch/pytorch/issues/63767 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991425 Pulled By: albanD fbshipit-source-id: 1214143b8aea87e6ff406c7fc13096bd15d1a768	2021-09-17 08:32:53 -07:00
albanD	1ed9c33d08	Update fx proxy to use classmethod for __torch_function__ (#64842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64842 Change the `__torch_function__` to follow best guidelines of using classmethods. I am not sure how to handle the case where multiple tracer objects are given as input but given that before we were getting an arbitrary tracer from there based on the "self" that was arbitrarily chosen by the torch_function caller, the new implementation is not worst? Let me know what you think! Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991423 Pulled By: albanD fbshipit-source-id: d28940df230b543952b278a0eb2d61cf7ae123ce	2021-09-17 08:32:51 -07:00
albanD	473e55d5b2	Use classmethods for overrides (#64841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991424 Pulled By: albanD fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd	2021-09-17 08:32:49 -07:00
Howard Huang	a95fabfecb	Fix port allocation race condition for elastic test (#65149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149 Fixes #64789 There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation. This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30993166 Pulled By: H-Huang fbshipit-source-id: eac4f28d641ac87c4ebee89df83f90955144f2f1	2021-09-17 08:32:47 -07:00
Stephen Jia	f101070587	Small improvements to compare_models_torch binary (#65171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171 Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range. Test Plan: ``` # Build the binary cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou # Push it to the device adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models # Run the benchmark binary BENCH_CMD="/data/local/tmp/compare_models" BENCH_CMD+=" --model=$PATH_TO_MODEL" BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL" BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE" BENCH_CMD+=" --iter=100" BENCH_CMD+=" --tolerance 1e-5" ``` Reviewed By: beback4u Differential Revision: D30371322 fbshipit-source-id: 5e520aaf119c90985a1d5a135f76e4057148333b	2021-09-17 08:32:45 -07:00
Edward Yang	9601deb1b3	Disable autograd fallback tests on Windows (#65147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147 I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763 ghstack-source-id: 138247203 Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg` Reviewed By: soulitzer Differential Revision: D30992685 fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6	2021-09-17 08:32:43 -07:00
Michael Dagitses	aaffcfe9cd	implement "xy" indexing for torch.meshgrid (#62724 ) Summary: This is step 4/7 of https://github.com/pytorch/pytorch/issues/50276. This allows the use of `"xy"` indexing but doesn't change any defaults. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62724 Reviewed By: heitorschueroff Differential Revision: D30995290 Pulled By: dagitses fbshipit-source-id: 08a6a6144b20bc019f68bc3c52e3bbf967976d8f	2021-09-17 08:31:17 -07:00
Alban Desmaison	d37c02be08	Allow parametrization to be nested (#65167 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167 Reviewed By: jbschlosser Differential Revision: D31002318 Pulled By: albanD fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e	2021-09-17 07:29:01 -07:00
Nicolas Hug	9157a2889f	Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64760 This should hopefully put the torchhub tests back. This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail. cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807 Reviewed By: seemethere Differential Revision: D30994585 Pulled By: NicolasHug fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a	2021-09-17 03:30:56 -07:00
Tao Xu	7dc3858deb	[CoreML][fbcode] Add the `preprocess` python APIs (#64521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521 Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage. ghstack-source-id: 138324214 Test Plan: ``` (base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt" Parsing buck files: finished in 0.5 sec Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated Total time: 11.1 sec Converting Frontend ==> MIL Ops: 100%\|██████████████████████████████████████████▉\| 382/383 [00:00<00:00, 692.58 ops/s] Running MIL optimization passes: 100%\|███████████████████████████████████████████\| 18/18 [00:00<00:00, 45.55 passes/s] Translating MIL ==> MLModel Ops: 100%\|███████████████████████████████████████████\| 704/704 [00:01<00:00, 468.56 ops/s] input { name: "input_0" type { multiArrayType { shape: 1 shape: 3 shape: 224 shape: 224 dataType: FLOAT32 } } } output { name: "645" type { multiArrayType { dataType: FLOAT32 } } } metadata { userDefined { key: "com.github.apple.coremltools.source" value: "torch==1.10.0a0+fb" } userDefined { key: "com.github.apple.coremltools.version" value: "4.1" } } {'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'} WARNING: Logging before InitGoogleLogging() is written to STDERR W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2, %x.1 : Tensor): %51 : str = prim::Constant[value="Exception: Backend is not available."]() %50 : str = prim::Constant[value="AssertionError: "]() %14 : str = prim::Constant[value="forward"]() # <string>:5:62 %48 : Tensor = prim::Uninitialized() %44 : Tensor = prim::Uninitialized() %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1) %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19 %49 : Tensor = prim::If(%8) # <string>:4:16 block0(): %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1) %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47 %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24 %18 : Any = prim::ListUnpack(%17) %20 : bool = prim::isinstance[types=[Tensor]](%18) %39 : Tensor = prim::If(%20) # <string>:6:18 block0(): %22 : Tensor = prim::unchecked_cast(%18) -> (%22) block1(): = prim::RaiseException(%50) # <string>:6:18 -> (%44) -> (%39) block1(): = prim::RaiseException(%51) # <string>:9:18 -> (%48) return (%49) ``` Reviewed By: raziel Differential Revision: D30585154 fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d	2021-09-17 00:25:14 -07:00
Don Jang	8241193d76	[Static Runtime] Introduce static_runtime::dict_unpack (#64771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771 Test Plan: - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict` - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict` - TBD: Perf impact measurement Reviewed By: mikeiovine Differential Revision: D30685083 fbshipit-source-id: 050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9	2021-09-16 23:25:13 -07:00
BowenBao	e6c39a521b	[ONNX] Update submodule to 1.10.1 (#63716 ) (#64576 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/64576 [ONNX] Update submodule to 1.10.1 (https://github.com/pytorch/pytorch/issues/63716) * [ONNX] Update IR version to 7 * [ONNX] update submodule to 1.10.1 * Disable some tests in caffe2 that fail b/c caffe2 doesn't support the new ops. * Update Bazel file. * Update expect files for new ONNX IR version Pull Request resolved: https://github.com/pytorch/pytorch/pull/64576 Reviewed By: jansel Differential Revision: D31006896 Pulled By: msaroufim fbshipit-source-id: f3bf97709f23a5a2cd49c708e7363231f2c1961a	2021-09-16 22:29:54 -07:00
James Reed	9117eed6ed	[FX} Add torch.ops.profiler._record_function_{enter,exit} as stateful ops for DCE (#65180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65180 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31007115 Pulled By: jamesr66a fbshipit-source-id: 823b15db712a382a4f2a4fd409983d47bc067150	2021-09-16 21:31:54 -07:00
Zafar Takhirov	02dec91212	[quant] AO migration of the `torch/quantization/utils.py` (phase 1) (#64919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities. ghstack-source-id: 138303325 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: jerryzh168 Differential Revision: D30899082 fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9	2021-09-16 21:30:18 -07:00
Jordan Fix	64641eaee6	[acc_utils] Add print_model_info (#65045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65045 This is a useful tool for printing out all of the ops that are found in a model after acc_tracer. It assumes the provided model has no `call_module` or `call_method`, which is generally a reasonable assumption assuming a model has been successfully traced by the acc_tracer. Test Plan: Tested locally. Sample output: ``` Model Info: > placeholder: 1184 > get_attr: 655 > output: 2 > torch.fx.experimental.fx_acc.acc_ops.add: 2 > torch.fx.experimental.fx_acc.acc_ops.cat: 23 > torch.fx.experimental.fx_acc.acc_ops.embedding_bag: 576 > torch.fx.experimental.fx_acc.acc_ops.layer_norm: 15 > torch.fx.experimental.fx_acc.acc_ops.linear: 27 > torch.fx.experimental.fx_acc.acc_ops.matmul: 3 > torch.fx.experimental.fx_acc.acc_ops.mul: 17 > torch.fx.experimental.fx_acc.acc_ops.permute: 2 > torch.fx.experimental.fx_acc.acc_ops.reshape: 419 > torch.fx.experimental.fx_acc.acc_ops.sigmoid: 16 > torch.fx.experimental.fx_acc.acc_ops.slice_tensor: 630 > torch.fx.experimental.fx_acc.acc_ops.sum: 4 > torch.fx.experimental.fx_acc.acc_ops.tanh: 315 ``` Reviewed By: 842974287 Differential Revision: D30954829 fbshipit-source-id: 5c4f0770667b72859b74099d9f4575284fc48bd2	2021-09-16 20:29:22 -07:00
Yinghai Lu	8c38d141df	Add back the owning_module fix (#65159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65159 This was a legit fix originally introduced in D30905949 (`446d95a7f6`). But we hesitated and removed it for some reason. Putting it back. Reviewed By: 842974287 Differential Revision: D30996277 fbshipit-source-id: 3f5eede11dba2072e7cd5ae6ca7ac81d55fb75fa	2021-09-16 19:29:56 -07:00
Rui Zhu	c886406ce0	Add dropout shape inference as no-op in acc_tracer (#65113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65113 Register dropout as no-op in acc_tracer & Add shape inference for no-op Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference --- test_unary_15_dropout_no_op buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_dropout Reviewed By: jfix71 Differential Revision: D30880679 fbshipit-source-id: 592fe50e17137c94c12727658191dedf08daf8cf	2021-09-16 18:26:55 -07:00
Nikita Shulga	6f120ada50	Pin SciPy to 1.6.2 on Windows (#65017 ) Summary: Re-enable previously disabled test_distributions Note: conda does not have ScipPy-1.6.3, only 1.6.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65017 Reviewed By: seemethere Differential Revision: D31003199 Pulled By: malfet fbshipit-source-id: 96b9d2a833f703008bb1f4df9361db8ec6f8ccc6	2021-09-16 18:25:43 -07:00
Avery Wang	0a5149019f	Added logging for the Reducer's non-member functions. (#65023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65023 Added an optional logging parameter for non-member functions `compute_bucket_assignment_by_size` and `verify_replica0_across_processes`. If a logger is provided then `TORCH_CHECK` assertions are replaced with a wrapper that logs the error to the DDP reducer's logger before calling `TORCH_CHECK`. If a logger is not provided `TORCH_CHECK` is still called. Modified python-side calls to `_compute_bucket_assignment_by_size` and `_verify_model_across_ranks` to include a logger whenever possible. A notable exception is when these non-member functions are called in DDP's constructor - we cannot pass in a logger as they may have not been initialized yet. We also added 4 new tests: `test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` which tests the `_compute_bucket_assignment_by_size` function to ensure that sparse tensors are rejected and the errors are logged. `test_verify_model_across_rank_{with, without}_logger` calls `_verify_model_across_ranks` to ensure that ill-formed models (different ranks have different number of parameters compared to rank 0) are rejected and the errors are logged. The test `test_ddp_model_diff_across_ranks` remains unchanged - while it does construct a ill-formed DDP instance which triggers the error in `_verify_model_across_ranks`, we cannot check the logger because this error appears in the constructor. Lastly, did some cleanup of the `test_ddp_model_diff_across_ranks` function to make the logic of choosing which context manager and error message to use more clean. Test Plan: Build commands `buck build mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --keep-going` `buck build mode/dev-nosan //caffe2/test/distributed:distributed_gloo_spawn --keep-going` Test commands Test for `_compute_bucket_assignment_by_size` (Python)/ `compute_bucket_assignment_by_size` (C++) `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` Test for `_verify_model_across_ranks` (Python)/`verify_replicas0_across_process` (C++) `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_verify_model_across_ranks_{with, without}_logger` Test that constructs an ill-formed DDP instance. Only did cleanup of this function. `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_ddp_model_diff_across_ranks` Reviewed By: rohan-varma Differential Revision: D30924790 fbshipit-source-id: dae6fa82485a204a6a4b022f2d073417d07ebb2f	2021-09-16 16:39:39 -07:00
kshitij12345	873255c6d9	OpInfo: nn.functional.conv2d (#63517 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Reference: https://github.com/facebookresearch/functorch/issues/78 Mostly inspired from https://github.com/pytorch/pytorch/issues/62882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63517 Reviewed By: heitorschueroff Differential Revision: D30993855 Pulled By: zou3519 fbshipit-source-id: 7402f99addb4ef8f19c2ce1a09ed9006e737cc7e	2021-09-16 14:27:36 -07:00
Jane Xu	4c4c03124b	Remove old references to 9.2 in documentation (#65059 ) Summary: Removes references in .rst and README.md and comments in the Dockerfile Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059 Reviewed By: malfet Differential Revision: D30961110 Pulled By: janeyx99 fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018	2021-09-16 13:24:05 -07:00
Kefei Lu	4c15f8e8b4	Provide function interface for `remove_duplicate_output_args` (#65134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65134 So that its implementation can be abstracted and replaced Test Plan: Run linter, CI Reviewed By: 842974287 Differential Revision: D30966916 fbshipit-source-id: 92ec78c7410d0be14faecb0ba1eafdc74bab5a5d	2021-09-16 13:17:37 -07:00
Kefei Lu	f9c341fdf2	Add type annotation for `TRTInterpreter.run` (#65135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65135 Opportunistically adding type annotation as I work through fx2trt code base. Test Plan: run linter and CI Reviewed By: houseroad, 842974287 Differential Revision: D30903185 fbshipit-source-id: 3f700b57f4433f2d312c1ff2e6b99948e3c8845c	2021-09-16 13:16:06 -07:00
Charles David Hernandez	8a094e3270	[quant]ao migration for quantization mappings and fuser method mappings hg mv (#64985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64985 moving quantization_mappings.py and fuser_method_mappings.py to the ao folder while retaining backwards compatibility also added dict test ghstack-source-id: 138215312 Test Plan: buck test mode/dev //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testrun/7036874471986444 buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/5348024625792701 Reviewed By: z-a-f Differential Revision: D30982551 fbshipit-source-id: 00f53bd44009d6012a7de852000aad6885131edb	2021-09-16 12:59:20 -07:00
Jane Xu	9af6fe991c	Remove CUDA 9.2 and older references from our cmake (#65065 ) Summary: Removes old CUDA references in our cuda.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/65065 Reviewed By: malfet Differential Revision: D30992673 Pulled By: janeyx99 fbshipit-source-id: 85b524089ed57e5acbc71720267cf05e24a8c20a	2021-09-16 12:54:49 -07:00
Nikita Shulga	67570a60ba	Disable ParallelTBB (#65092 ) Summary: As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883 More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092 Reviewed By: zhouzhuojie Differential Revision: D30995936 Pulled By: malfet fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36	2021-09-16 12:38:45 -07:00
Zhengxu Chen	96cb05b49a	Introduce tensorRT as builtin module for torch::deploy. (#63818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63818 ghstack-source-id: 138156957 Test Plan: next diff Reviewed By: wconstab Differential Revision: D30499309 fbshipit-source-id: 4ab1bc9896243c0c1503afb18fbfb196fc37404e	2021-09-16 11:27:51 -07:00
David Berard	8eb21488fd	[JIT] Improve BatchMM mutability handling (#65097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097 Previously, BatchMM would skip any block containing any mutable operators. Now it will avoid batching any operation whose inputs or outputs are ever mutated. Specifically: consider a tree of ADD, T, and MM nodes rooted at an ADD node. If any input or output to any node in the tree is ever mutated, then the entire tree will be ignored by BatchMM. Test Plan: python test/test_jit.py TestBatchMM Reviewed By: eellison Differential Revision: D30973515 Pulled By: davidberard98 fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48	2021-09-16 10:46:14 -07:00
Charles David Hernandez	f309f8fbd4	[quant] ao migration of observer and qconfig (#64982 ) Summary: (Had to recreate this diff so it wasn't dependent on the stack) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982 migration of qconfig.py and observer.py to torch/ao/quantization using new test format ghstack-source-id: 138215256 Test Plan: buck test mode/opt //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/ buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/3940649742829796 Reviewed By: z-a-f Differential Revision: D30982534 fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9	2021-09-16 10:33:16 -07:00
Kushashwa Ravi Shrimali	97e86cf319	[Fix] Raise error when empty index tensor is passed (gather) (#65006 ) Summary: See https://github.com/pytorch/pytorch/pull/63312#issuecomment-919330081 for context. cc: ezyang ysiraichi Pull Request resolved: https://github.com/pytorch/pytorch/pull/65006 Reviewed By: mruberry Differential Revision: D30937730 Pulled By: ezyang fbshipit-source-id: a8f77b1f40d07e7e3bef6caaafa119685f297638	2021-09-16 10:14:26 -07:00
James Reed	874f9bd509	[FX] Gate FXGraphDrawer on whether pydot is installed (#65088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65088 Test Plan: Imported from OSS Reviewed By: khabinov Differential Revision: D30967951 Pulled By: jamesr66a fbshipit-source-id: dba2f13a47889b3d4187de925b4fe74ee90b7f79	2021-09-16 10:04:33 -07:00
Michael Dagitses	2c57bbf521	add support for indexing to meshgrid (#62722 ) Summary: This is step 3/7 of https://github.com/pytorch/pytorch/issues/50276. It only adds support for the argument but doesn't implement new indexing modes yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62722 Test Plan: Verified this is not FC breaking by adding logging to both meshgrid overloads and then called meshgrid twice: `meshgrid(tensors)` and `meshgrid(tensors, indexing='ij')` This confirmed that the former signature triggered the original native function and the latter signature triggered the new native function. Reviewed By: H-Huang Differential Revision: D30394313 Pulled By: dagitses fbshipit-source-id: e265cb114d8caae414ee2305dc463b34fdb57fa6	2021-09-16 09:59:49 -07:00
Richard Zou	67bd2a31b5	[Reland] Add python mode (#64360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64360 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: ezyang Differential Revision: D30698082 Pulled By: zou3519 fbshipit-source-id: 7094a90eee6aa51f8b71bc4d91cfb6f49e9691f8	2021-09-16 09:02:30 -07:00
Alban Desmaison	8800a8b428	Revert D30888794: [Model Averaging] Simplify PostLocalSGD Optimizer API Test Plan: revert-hammer Differential Revision: D30888794 (`3d312b3b8e`) Original commit changeset: 21261b480f6b fbshipit-source-id: 87abb7e8cd9ecaac909ec6c3ee053fa7c4ae1975	2021-09-16 06:39:57 -07:00
Rodrigo Berriel	83878e19ff	Improve LSTM documentation for proj_size > 0 (#65102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65053. Although the documentation states that: `fe0f9d1daf/torch/nn/modules/rnn.py (L500-L506)` It seems that the definition of `weight_ih_l[k]` could be improved by specifying what happens when `k > 0` and `proj_size > 0`. As `proj_size` is only used in LSTM, no changes are needed for the other RNNs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65102 Reviewed By: supriyar Differential Revision: D30975781 Pulled By: jbschlosser fbshipit-source-id: 12df06e5e6a8d5de0ad10fb15e33c3e6311c11d3	2021-09-16 06:35:27 -07:00
Scott Wolchok	f69cf3cf2f	[Static Runtime] Use FastSet instead of std::set everywhere (#65114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65114 There doesn't seem to be any reason to use std::set for sets of pointers, right? ghstack-source-id: 138198504 Reviewed By: hlu1 Differential Revision: D30978450 fbshipit-source-id: 4599c6249fda3a89959f839d3bf6400c5891f82c	2021-09-15 21:44:54 -07:00
Amr Elshennawy	0bda7476cf	Reduce PyToch Warnings - Cast fixes from D26624430 (#65015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65015 Split out the existing fixes into a diff we can land separately. Test Plan: pooled_embeddings_modules_test Parsing buck files: finished in 8.3 sec Creating action graph: finished in 38.3 sec [RE] Metadata: Session ID=[https://fburl.com/b/reSessionID-9bea421c-875e-4168-9e00-7d67479b1a9f] [RE] Waiting on 46 remote actions. Completed 905 actions remotely, action cache hit rate: 5.08%. Downloaded 7002/8869 artifacts, 560.00 Mbytes, 11.6% cache miss (for updated rules) Building: finished in 13:12.4 min (100%) 31964/31964 jobs, 17344/31964 updated Total time: 13:59.1 min More details at https://www.internalfb.com/intern/buck/build/b9a58bba-e0aa-4c2b-8824-a0c4074b0954 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 28cbe2b1-6fbc-450c-91c9-c06a7ff1d53b Trace available for this run at /tmp/tpx-20210914-114921.005504/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1407375088325000 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (23.849) {emoji:2702} Omit: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) Test output: > This test was disabled. To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck). To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/562949981577936 ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201) ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (13.201) Summary Pass: 3 Skip: 3 ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) Omit: 1 {emoji:2702} caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) ListingSuccess: 1 shape_inference_mode_test [amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:shape_inference_mode_test Downloaded 6/18 artifacts, 11.69 Kbytes, 53.8% cache miss (for updated rules) Building: finished in 1.6 sec (100%) 110/110 jobs, 26/110 updated Total time: 1.8 sec More details at https://www.internalfb.com/intern/buck/build/0e5f45b2-5777-49e9-a3b0-09bd05687b2b Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 99509108-5ff3-4b1a-b7b3-2f43c4036209 Trace available for this run at /tmp/tpx-20210914-120119.723607/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6192449502564504 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:shape_inference_mode_test - main (0.374) ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_mode (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.249) ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_settings (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.253) Summary Pass: 2 ListingSuccess: 1 test [amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:test Parsing buck files: finished in 1.1 sec Creating action graph: finished in 38.6 sec Downloaded 6/30 artifacts, 11.29 Kbytes, 66.7% cache miss (for updated rules) Building: finished in 41.6 sec (100%) 26783/26783 jobs, 43/26783 updated Total time: 01:21.4 min More details at https://www.internalfb.com/intern/buck/build/8f794eb0-3d3c-4ee3-9aec-5ec5cec1b0f4 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: a06164b5-d7d7-444c-a4ff-e312cb9970d9 Trace available for this run at /tmp/tpx-20210914-120428.464799/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (16.637) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_dense_mlp_quantize_ops (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.870) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.922) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2 (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.348) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_simple (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.370) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D_batch (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.516) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_byte_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.515) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.861) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bags (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.873) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.969) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_pad_minf (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.104) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_multiple_runs (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.342) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transform (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.664) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.745) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.771) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.245) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_nonbinary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.328) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_8bitfakefused (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.501) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.608) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_inference_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.403) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.025) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_negatives_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.956) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (24.100) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.384) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.672) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_empty_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.679) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.726) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_ranges_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.567) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_all_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.036) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_32bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.430) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.176) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_dense_feature_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.006) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.555) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_int_nbit_split_embedding_codegen_lookup_function (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.791) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_smaller_max_len (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.737) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_pos (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.212) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_2bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.612) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_binary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.858) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.002) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_tracing (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.824) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_counts (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.976) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.832) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.844) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.558) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_non_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.418) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_accumulate (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_unsqueeze_vector (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.327) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_4bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.772) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.425) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_backward (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.956) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_offsets_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.320) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.923) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.549) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transforms_create (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.932) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather_lengths_to_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.807) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_length_to_row_idx (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.738) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.175) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_mixed (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.116) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.671) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_permute_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.002) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_create_sigrid_transforms_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.151) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (16.780) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_no_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.185) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_cumsum (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.242) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_le_one (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.876) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_and_unpack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_dims (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.007) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_sigrid_hash_op (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.959) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_64bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.601) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.977) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_stack (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.588) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (15.342) Summary Pass: 73 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066 Did not run (no GPU on my devserver): gpu_test cpp_gpu_test Reviewed By: r-barnes Differential Revision: D30940399 fbshipit-source-id: d867ca646723340775a49c1b983cdab64f2d67d8	2021-09-15 21:20:41 -07:00
Priya Ramani	db601434ef	Bug fix (#65105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65105 Using buildErrorMessage in external_functions.cpp was breaking build target nnc_cpu_backend_lib as buildErrorMessage is defined in tensorexpr/kernel.cpp which is not included in mobile builds and we don't want to include it in mobile builds. Also buildErrorMessage wraps error messages for fuser whereas nnc_aten_conv2d is now only used in AOT workflow and not called by the fuser. So wrapping assertion failures with fuser error message would be misleading for AOT workflow. Test Plan: Before fix: ``` + buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc Downloading... 3/3 artifacts, 24.81 Kbytes, 0.0% cache miss (for updated rules) Building... 1.7 sec (99%) 4639/4641 jobs, 3/4641 updated - //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary... 0.7 sec (running c++ link[0.6 sec]) Command failed with exit code 1. command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0/bin/clang++, --cc=/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__fbc... <truncated> ... stderr: clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) >>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69) >>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a clang-9: error: linker command failed with exit code 1 (use -v to see invocation) When running <c++ link>. When building rule //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary (ovr_config//platform/linux:x86_64-fbcode). clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) >>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69) >>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a clang-9: error: linker command failed with exit code 1 (use -v to see invocation) Command failed with exit code 1. command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0[DEBUG kernel.cpp:2766] } ``` After fix: ``` + buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc Action graph will be rebuilt because files have been added or removed. clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 11/15 artifacts, 78.37 Kbytes, 15.4% cache miss (for updated rules) Building: finished in 7.4 sec (100%) 4718/4718 jobs, 46/4718 updated Total time: 7.5 sec More details at https://www.internalfb.com/intern/buck/build/b87be016-340c-49f8-b832-0c1de70aae9e ``` Reviewed By: ZolotukhinM Differential Revision: D30975952 fbshipit-source-id: 85c028cc6af63c03b505b51302f5158c23e1a047	2021-09-15 20:11:30 -07:00
Jordan Fix	2bb898e039	[acc_ops] Add support for torch variants of squeeze and mul (#65037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037 att Test Plan: updated unit tests Reviewed By: yuhc Differential Revision: D30952224 fbshipit-source-id: aaf75b27b4fc6c0436ba7bfcf324f761b900171b	2021-09-15 19:41:04 -07:00
Priya Ramani	206646d6ed	Add NNC AOT Compiler executable (#63994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30582149 Pulled By: priyaramani fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6	2021-09-15 19:18:24 -07:00
Zafar Takhirov	e0ecd09011	[quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_quantize.py` (#64917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates from torch.quantization to torch.ao.quantization the following files: - `_correct_bias.py` - `_equalize.py` - `_learnable_fake_quantize.py` Note: These file are migrated completely without any warning. The old location is thus silently deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection` Reviewed By: vkuzo Differential Revision: D30898565 fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd	2021-09-15 18:15:39 -07:00
Jane Xu	3ceecebed0	.circleci/.jenkins: Remove 9.2 references in CI (#65024 ) Summary: Removes 9.2 references in CI scripts and configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024 Reviewed By: driazati Differential Revision: D30945948 Pulled By: janeyx99 fbshipit-source-id: 77890a00520c61500a934a90a74e3fcca84c09b5	2021-09-15 18:06:57 -07:00
Jane Xu	d9d8250e3f	.github: GHA add retry for docker run in chown workspace step (#65104 ) Summary: This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/3614067053 I did not add retries to other steps with docker run Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104 Reviewed By: seemethere Differential Revision: D30976330 Pulled By: janeyx99 fbshipit-source-id: e403008548aa01c9a0a4ccebe56df0e889dd045c	2021-09-15 18:02:07 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
Zafar Takhirov	c151d62f45	[quant] AO migration of the `quant_types.py` (phase 1) (#64916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quant_type.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898422 fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff	2021-09-15 17:30:00 -07:00
Zafar Takhirov	a42996f16e	[quant] AO migration of the `fuse_modules.py` (phase 1) (#64913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the fuse_module.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882819 fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4	2021-09-15 17:28:47 -07:00
Mikhail Zolotukhin	7e9c599784	[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63	2021-09-15 17:15:06 -07:00
Eli Uriegas	3d5923366d	.github: Enable only specific workflows for canary (#65099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099 Utilizes ciflow to enable only specific workflows for pytorch/pytorch-canary to reduce noise on that specific repository Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30973691 Pulled By: seemethere fbshipit-source-id: 371765535b42a00bd72c2551c4faebf733d759f0	2021-09-15 16:53:12 -07:00
Eli Uriegas	59c486f2f3	ci: Disable jit legacy on circleci, enable on gha (#65106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D30976186 Pulled By: seemethere fbshipit-source-id: 8958f821eab9aa284496c57915894ed70f6b2fff	2021-09-15 16:11:38 -07:00
Jane Xu	b75d3cae4c	CI: Upgrade windows 10.1 jobs to 10.2 (#65080 ) Summary: This is first 2 steps in the following task: 1. Upgrade 10.1 to 10.2 2. Migrate force_on_cpu job to GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080 Test Plan: https://github.com/pytorch/pytorch/pull/65086 Reviewed By: seemethere Differential Revision: D30973655 Pulled By: janeyx99 fbshipit-source-id: 67ab69ea99ff9e0336400a7173efef6d7daac07c	2021-09-15 16:04:50 -07:00
Jane Xu	3f27c1ae78	Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090 ) Summary: As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR. This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090 Reviewed By: seemethere Differential Revision: D30968382 Pulled By: janeyx99 fbshipit-source-id: c73f9a2cc800b678909365c4d80627d29fc09f94	2021-09-15 16:01:07 -07:00
Natalia Gimelshein	ec1af11c2e	Revert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp Test Plan: revert-hammer Differential Revision: D30883290 (`0e11454d19`) Original commit changeset: a37570f8d943 fbshipit-source-id: 65c57a2b0d2e3c7006765195dd519e8cf2472f72	2021-09-15 15:40:34 -07:00
Charles David Hernandez	37bcefa248	[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981 this would have cause errors when observer.py was moved to ao. see: D30391189 ghstack-source-id: 138118430 Test Plan: buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)' buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)' Reviewed By: supriyar Differential Revision: D30432008 fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc	2021-09-15 15:22:19 -07:00
Scott Wolchok	fe0f9d1daf	[Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403 No need to copy to the heap here. ghstack-source-id: 138033019 Test Plan: CI Reviewed By: smacke Differential Revision: D30712506 fbshipit-source-id: 5f4131b2569ebb1f5092262aaddb17215dea88f1	2021-09-15 15:15:51 -07:00
Scott Wolchok	208cf051d4	[Caffe2] Don't pass vector by value in SqueezeOp (#64400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400 There appears to be no need to copy this vector. ghstack-source-id: 138033020 Test Plan: CI Reviewed By: smacke Differential Revision: D30711014 fbshipit-source-id: b9fcf3d496a663b8478aa22d52b2c41f8f85e90f	2021-09-15 15:14:30 -07:00
David Riazati	177ebea4c5	Use RDS for build size tracking (#64303 ) Summary: This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis). It also hooks these up for build size tracking (which previously was not working on GHA) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303 Reviewed By: mruberry Differential Revision: D30941182 Pulled By: driazati fbshipit-source-id: 12c5575ddd29902477464fc989ad76a052306b9b	2021-09-15 14:47:37 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Elias Ellison	59988f81bd	Add embedding shape analysis (#64323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738145 Pulled By: eellison fbshipit-source-id: be12408330d671bc65cf645aa2c20fafd954e6a9	2021-09-15 13:45:48 -07:00
Elias Ellison	29514bfcdb	Max Pool with indices (#64121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121 Add support for aten operators which return multiple outputs Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738142 Pulled By: eellison fbshipit-source-id: 0d7e51187bd5e3e9b43f0fdb5178366a97aec943	2021-09-15 13:45:46 -07:00
Elias Ellison	2626cd3ba4	Add Maxpool to shape analysis / Opinfo (#63530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738147 Pulled By: eellison fbshipit-source-id: cf52339e572ee04e0d6167fd95d8a82d58ea7706	2021-09-15 13:44:33 -07:00
Zafar Takhirov	425f173f9d	[quant][refactor] Change the structure of the ao migration tests (#64912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912 The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out ghstack-source-id: 138157450 Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898214 fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84	2021-09-15 13:15:43 -07:00
David Riazati	2967a48b78	Add retries to ECR login step (#65013 ) Summary: Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013 Reviewed By: zhouzhuojie, mruberry Differential Revision: D30943292 Pulled By: driazati fbshipit-source-id: 0a21e9b4eacbb77e6aca22f9256d94cd591b23cd	2021-09-15 13:12:57 -07:00
Ilqar Ramazanli	df3d649380	To add state dict and load_dict for Chained Scheduler (#65034 ) Summary: Adding state_dict() and load_state_dict() methods for Chained Scheduler Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034 Reviewed By: prabhat00155, nateanl Differential Revision: D30958207 Pulled By: datumbox fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa	2021-09-15 13:11:41 -07:00
BowenBao	6512838fab	[ONNX] Enhance shape (two changes merged) (#64585 ) Summary: Enhanced shape inference by introducing typeReliableMap. [ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585 Reviewed By: ezyang Differential Revision: D30870418 Pulled By: msaroufim fbshipit-source-id: 87a294799cb87d649d1d13b6114a5cfbac9be15c Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-09-15 13:02:19 -07:00
Don Jang	0e11454d19	[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011 This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D30883290 fbshipit-source-id: a37570f8d9430224a6987d2190bcf81cf875043d	2021-09-15 12:57:39 -07:00
Kiuk Chung	db134a6843	(torch.distributed.elastic) properly format traceback on error (#65041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041 Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception. Fixes https://github.com/pytorch/pytorch/issues/60910 Closes https://github.com/pytorch/pytorch/issues/60910 BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError): ``` ************************************************************************************************************************************************************************************************************************************************** run_script_path FAILED ================================================================================================================================================================================================================================================== Root Cause: [0]: time: 2021-09-14_22:01:06 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1092727) error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json msg: { "message": "RuntimeError: rasing error since --throw was specified", "extraInfo": { "py_callstack": [ " File \"<string>\", line 1, in <module>\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n exitcode = _main(fd, parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n return self._bootstrap(parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n self.run()\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n self._target(self._args, self._kwargs)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n fn(i, args)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n ret = record(fn)(args_)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n" ], "timestamp": "1631682066" } } ================================================================================================================================================================================================================================================== Other Failures: <NO_OTHER_FAILURES> *********************************************************************************************************************************************************************************************************************************************** ``` AFTER (note the traceback is the traceback of the RuntimeError): ``` ****************************************************************************** run_script_path FAILED ================================================================================ Root Cause: [0]: time: 2021-09-14_21:49:25 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1014681) error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json msg: Traceback (most recent call last): File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(args, kwargs) File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path runpy.run_path(sys.argv[0], run_name="__main__") File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/kiuk/tmp/test.py", line 55, in <module> main() File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(args, kwargs) File "/home/kiuk/tmp/test.py", line 25, in main raise RuntimeError("rasing error since --throw was specified") RuntimeError: rasing error since --throw was specified ================================================================================ Other Failures: <NO_OTHER_FAILURES> ****************************************************************************** ``` Test Plan: (see summary for before and after) `test.py` contents: ``` import argparse import os import sys import torch import torch.distributed as dist import torch.nn.functional as F from torch.distributed.elastic.multiprocessing.errors import record def parse_args(argv): parser = argparse.ArgumentParser(description="test script") parser.add_argument("--init_method", type=str, default="env://") parser.add_argument("--backend", type=str, default="gloo") parser.add_argument("--throw", action="store_true", default=False) parser.add_argument("--exit", action="store_true", default=False) return parser.parse_args() record def main(): args = parse_args(sys.argv[1:]) if args.throw: raise RuntimeError("rasing error since --throw was specified") if args.exit: sys.exit(1) init_method=args.init_method backend=args.backend world_size = int(os.environ["WORLD_SIZE"]) rank = int(os.environ["RANK"]) print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}") dist.init_process_group( backend=backend, init_method=init_method, world_size=world_size, rank=rank) print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}") t = F.one_hot(torch.tensor(rank), num_classes=world_size) dist.all_reduce(t) derived_world_size = torch.sum(t).item() if derived_world_size != world_size: raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}") else: print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting") if __name__ == "__main__": main() ``` run it as: ``` $ python -m torch.distributed.run --nproc_per_node 2 test.py --throw ``` Reviewed By: cbalioglu Differential Revision: D30953731 fbshipit-source-id: bbea04c59c2aec58969cf44d8e3723d5f8abe8a8	2021-09-15 12:50:21 -07:00
soulitzer	4bf7959de2	Remove `run_functional_checks` from `test_autograd` and create necessary OpInfos (#64993 ) Summary: OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261 - Eliminate duplicated testing logic in test_autograd - Moved tests that rely on this testing logic to use OpInfos - `cat` already has OpInfo (no action needed) - Created OpInfo for `block_diag` and `broadcast_tensors` Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997 Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993 Reviewed By: jbschlosser Differential Revision: D30961736 Pulled By: soulitzer fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a	2021-09-15 12:45:38 -07:00
Peter Bell	21017ad1a1	Dispatch.h: Avoid including ivalue (#64165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728587 Pulled By: ezyang fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4	2021-09-15 12:16:44 -07:00
Ilqar Ramazanli	211ad231dc	To add state_dict and load_state_dict to SequentialLR (#65035 ) Summary: To add state_dict() and load_state_dict() methods to SequentialLR Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035 Reviewed By: prabhat00155, nateanl Differential Revision: D30958204 Pulled By: datumbox fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a	2021-09-15 12:01:51 -07:00
Nikita Shulga	8a652e0e91	[CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071 ) Summary: As all of them has been migrated to GHA: - pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)" "pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D30963413 Pulled By: malfet fbshipit-source-id: d9a5188ce7eb2f60547b91b854a5db83af2b10e7	2021-09-15 11:59:40 -07:00
Samuel Salas	f1ce64a58e	Starter Task 1 (#64927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927 Mypy error corrections Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None. Reviewed By: wushirong Differential Revision: D30901654 fbshipit-source-id: daae8692603b8b38203a98f673c455749c2fb855	2021-09-15 11:55:07 -07:00
Kyle Chen	dab6496dbe	[ROCm] Update CI images for ROCm 4.3.1 (#64610 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> reference: https://github.com/pytorch/pytorch/issues/58017 jithunnair-amd jeffdaily arindamroy-eng cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610 Reviewed By: seemethere Differential Revision: D30964582 Pulled By: malfet fbshipit-source-id: a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa	2021-09-15 11:49:54 -07:00
Yukio Siraichi	54d060a8c9	Port `all` and `any` full reductions to structured kernels. (#64642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642 Tracking issue: #55070 This PR creates out overloads for both `all` and `any` kernels (full reduction overload), and ports them to structured kernels. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867354 Pulled By: ezyang fbshipit-source-id: 46bccaf6c94a09ed77cc6c724d1183c82f801751	2021-09-15 11:06:47 -07:00
Scott Wolchok	54cdf651fd	[PyTorch] remove string_view::operator[] bounds check (#64670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670 Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff. ghstack-source-id: 138037531 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D30747515 fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369	2021-09-15 09:57:58 -07:00
Scott Wolchok	57420a6063	[PyTorch][easy] Add cbegin/cend to SmallVector (#64682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682 Looks like it was forked from llvm before cbegin and cend existed. ghstack-source-id: 138036981 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30814434 fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320	2021-09-15 09:57:56 -07:00
Scott Wolchok	bdbc622988	[PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678 We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that. ghstack-source-id: 138036978 Test Plan: CI Reviewed By: dhruvbird, tugsbayasgalan Differential Revision: D30813785 fbshipit-source-id: c67e0100cdef5d894282939fb6d39a57309bc240	2021-09-15 09:56:41 -07:00
Zafar Takhirov	0f1bccb692	[quant] Removing unnecessary import from torch/quantization/quantize.py (#64910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910 This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports. ghstack-source-id: 138112663 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882924 fbshipit-source-id: 8652a334a5186c635761ea5e50f978d1f1078c12	2021-09-15 09:39:04 -07:00
Don Jang	3fb33b38b9	[Static Runtime] Check if outputs of a node do not overlap with each other (#63013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013 This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs. This check will detect a problem like T97393697 immediately in debug mode. Test Plan: - Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs` - Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run. Reviewed By: hlu1 Differential Revision: D30211705 fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0	2021-09-15 08:38:05 -07:00
Jane Xu	26e43fe9f3	Forward fix SkipInfo missing mypy (#65063 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063 Reviewed By: malfet Differential Revision: D30961556 Pulled By: janeyx99 fbshipit-source-id: 9618e12ba873fb48fe5c846a48d4560ad521eb3e	2021-09-15 08:30:38 -07:00
Hong Xu	fb8bdb8039	When test set_affinity, don't hardcode the CPU ID (#65042 ) Summary: The setaffinity test always fails when the number of CPUs is smaller than 3. Changed the test to be dynamically based on the number of CPUs of the system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042 Reviewed By: jbschlosser Differential Revision: D30960554 Pulled By: ejguan fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce	2021-09-15 08:10:59 -07:00
Kevin Tse	c625f971d3	[DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788 ) Summary: ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit). This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream. Stack: * __->__ https://github.com/pytorch/pytorch/issues/64788 * https://github.com/pytorch/pytorch/issues/64786 cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788 Reviewed By: jbschlosser, ejguan Differential Revision: D30901176 Pulled By: NivekT fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e	2021-09-15 07:34:29 -07:00
Philip Meier	32c5da8cd2	add `OpInfo` for `torch.nn.functional.dropout` (#62315 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315 Reviewed By: mruberry Differential Revision: D30932765 Pulled By: zou3519 fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e	2021-09-15 07:18:04 -07:00
Jongsoo Park	d6d286f651	[dnnlowp] reduce num of test cases to avoid time out (#64935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D30889157 fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d	2021-09-14 21:32:12 -07:00
Joel Schlosser	b7ec7d760d	Generic test parametrization functionality (#60753 ) Summary: This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`: ```python import unittest from itertools import product from torch.testing._internal.common_device_type import ( instantiate_device_type_tests, deviceCountAtLeast, ops) from torch.testing._internal.common_methods_invocations import op_db from torch.testing._internal.common_utils import ( TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest) class TestBlah(TestCase): parametrize("x", range(5)) def test_default_names(self, x): print('Passed in:', x) # Use default names but add an expected failure. parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), range(1, 5)]) def test_default_names_expected_failure(self, x): if x == 0: raise RuntimeError('Boom') print('Passed in:', x) parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias') def test_custom_names(self, bias): print('Passed in:', bias) parametrize("bias", [subtest(True, name='bias'), subtest(False, name='no_bias')]) def test_custom_names_alternate(self, bias): print('Passed in:', bias) parametrize("x,y", [(1, 2), (1, 3), (1, 4)]) def test_two_things_default_names(self, x, y): print('Passed in:', x, y) parametrize("x", [1, 2, 3]) parametrize("y", [4, 5, 6]) def test_two_things_composition(self, x, y): print('Passed in:', x, y) parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), range(1, 3)]) parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])]) def test_two_things_composition_expected_failure(self, x, y): if x == 0 or y == 6: raise RuntimeError('Boom') print('Passed in:', x, y) parametrize("x", [1, 2]) parametrize("y", [3, 4]) parametrize("z", [5, 6]) def test_three_things_composition(self, x, y, z): print('Passed in:', x, y, z) parametrize("x", [1, 2], name_fn=str) parametrize("y", [3, 4], name_fn=str) parametrize("z", [5, 6], name_fn=str) def test_three_things_composition_custom_names(self, x, y, z): print('Passed in:', x, y, z) parametrize("x,y", product(range(2), range(3))) def test_two_things_product(self, x, y): print('Passed in:', x, y) parametrize("x,y", [subtest((1, 2), name='double'), subtest((1, 3), name='triple'), subtest((1, 4), name='quadruple')]) def test_two_things_custom_names(self, x, y): print('Passed in:', x, y) parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y)) def test_two_things_custom_names_alternate(self, x, y): print('Passed in:', x, y) class TestDeviceBlah(TestCase): parametrize("x", range(10)) def test_default_names(self, device, x): print('Passed in:', device, x) parametrize("x,y", [(1, 2), (3, 4), (5, 6)]) def test_two_things(self, device, x, y): print('Passed in:', device, x, y) deviceCountAtLeast(1) def test_multiple_devices(self, devices): print('Passed in:', devices) ops(op_db) parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled') def test_op_parametrized(self, device, dtype, op, flag): print('Passed in:', device, dtype, op, flag) instantiate_parametrized_tests(TestBlah) instantiate_device_type_tests(TestDeviceBlah, globals()) if __name__ == '__main__': run_tests() ``` Generated tests: ``` TestBlah.test_custom_names_alternate_bias TestBlah.test_custom_names_alternate_no_bias TestBlah.test_custom_names_bias TestBlah.test_custom_names_no_bias TestBlah.test_default_names_expected_failure_x_0 TestBlah.test_default_names_expected_failure_x_1 TestBlah.test_default_names_expected_failure_x_2 TestBlah.test_default_names_expected_failure_x_3 TestBlah.test_default_names_expected_failure_x_4 TestBlah.test_default_names_x_0 TestBlah.test_default_names_x_1 TestBlah.test_default_names_x_2 TestBlah.test_default_names_x_3 TestBlah.test_default_names_x_4 TestBlah.test_three_things_composition_custom_names_1_3_5 TestBlah.test_three_things_composition_custom_names_1_3_6 TestBlah.test_three_things_composition_custom_names_1_4_5 TestBlah.test_three_things_composition_custom_names_1_4_6 TestBlah.test_three_things_composition_custom_names_2_3_5 TestBlah.test_three_things_composition_custom_names_2_3_6 TestBlah.test_three_things_composition_custom_names_2_4_5 TestBlah.test_three_things_composition_custom_names_2_4_6 TestBlah.test_three_things_composition_x_1_y_3_z_5 TestBlah.test_three_things_composition_x_1_y_3_z_6 TestBlah.test_three_things_composition_x_1_y_4_z_5 TestBlah.test_three_things_composition_x_1_y_4_z_6 TestBlah.test_three_things_composition_x_2_y_3_z_5 TestBlah.test_three_things_composition_x_2_y_3_z_6 TestBlah.test_three_things_composition_x_2_y_4_z_5 TestBlah.test_three_things_composition_x_2_y_4_z_6 TestBlah.test_two_things_composition_expected_failure_x_0_y_4 TestBlah.test_two_things_composition_expected_failure_x_0_y_5 TestBlah.test_two_things_composition_expected_failure_x_0_y_6 TestBlah.test_two_things_composition_expected_failure_x_1_y_4 TestBlah.test_two_things_composition_expected_failure_x_1_y_5 TestBlah.test_two_things_composition_expected_failure_x_1_y_6 TestBlah.test_two_things_composition_expected_failure_x_2_y_4 TestBlah.test_two_things_composition_expected_failure_x_2_y_5 TestBlah.test_two_things_composition_expected_failure_x_2_y_6 TestBlah.test_two_things_composition_x_1_y_4 TestBlah.test_two_things_composition_x_1_y_5 TestBlah.test_two_things_composition_x_1_y_6 TestBlah.test_two_things_composition_x_2_y_4 TestBlah.test_two_things_composition_x_2_y_5 TestBlah.test_two_things_composition_x_2_y_6 TestBlah.test_two_things_composition_x_3_y_4 TestBlah.test_two_things_composition_x_3_y_5 TestBlah.test_two_things_composition_x_3_y_6 TestBlah.test_two_things_custom_names_alternate_1_2 TestBlah.test_two_things_custom_names_alternate_1_3 TestBlah.test_two_things_custom_names_alternate_1_4 TestBlah.test_two_things_custom_names_double TestBlah.test_two_things_custom_names_quadruple TestBlah.test_two_things_custom_names_triple TestBlah.test_two_things_default_names_x_1_y_2 TestBlah.test_two_things_default_names_x_1_y_3 TestBlah.test_two_things_default_names_x_1_y_4 TestBlah.test_two_things_product_x_0_y_0 TestBlah.test_two_things_product_x_0_y_1 TestBlah.test_two_things_product_x_0_y_2 TestBlah.test_two_things_product_x_1_y_0 TestBlah.test_two_things_product_x_1_y_1 TestBlah.test_two_things_product_x_1_y_2 TestDeviceBlahCPU.test_default_names_x_0_cpu TestDeviceBlahCPU.test_default_names_x_1_cpu TestDeviceBlahCPU.test_default_names_x_2_cpu TestDeviceBlahCPU.test_default_names_x_3_cpu TestDeviceBlahCPU.test_default_names_x_4_cpu TestDeviceBlahCPU.test_default_names_x_5_cpu TestDeviceBlahCPU.test_default_names_x_6_cpu TestDeviceBlahCPU.test_default_names_x_7_cpu TestDeviceBlahCPU.test_default_names_x_8_cpu TestDeviceBlahCPU.test_default_names_x_9_cpu TestDeviceBlahCPU.test_multiple_devices_cpu TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu TestDeviceBlahMETA.test_default_names_x_0_meta TestDeviceBlahMETA.test_default_names_x_1_meta TestDeviceBlahMETA.test_default_names_x_2_meta TestDeviceBlahMETA.test_default_names_x_3_meta TestDeviceBlahMETA.test_default_names_x_4_meta TestDeviceBlahMETA.test_default_names_x_5_meta TestDeviceBlahMETA.test_default_names_x_6_meta TestDeviceBlahMETA.test_default_names_x_7_meta TestDeviceBlahMETA.test_default_names_x_8_meta TestDeviceBlahMETA.test_default_names_x_9_meta TestDeviceBlahMETA.test_multiple_devices_meta TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta TestDeviceBlahMETA.test_two_things_x_1_y_2_meta TestDeviceBlahMETA.test_two_things_x_3_y_4_meta TestDeviceBlahMETA.test_two_things_x_5_y_6_meta ``` Caveats: `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either: * Allow stacking of multiple decorators * Error out with a nice error message if multiple decorators are specified The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753 Reviewed By: saketh-are Differential Revision: D30606615 Pulled By: jbschlosser fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65	2021-09-14 19:52:59 -07:00
Sangbaek Park	6ab97fbc28	[vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968 The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms. 1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution) 2. Make DotSlash work on Windows when resolving glslc path Test Plan: For Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` For Mac: ``` buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` On Local OSS repo with `pr/64988` branch: The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available. Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install` Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp` ``` Running main() from ../third_party/googletest/googletest/src/gtest_main.cc [==========] Running 69 tests from 1 test suite. [----------] Global test environment set-up. [----------] 69 tests from VulkanAPITest [ RUN ] VulkanAPITest.adaptive_avg_pool2d [ OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms) [ RUN ] VulkanAPITest.add [ OK ] VulkanAPITest.add (51 ms) [ RUN ] VulkanAPITest.add_broadcast0 [ OK ] VulkanAPITest.add_broadcast0 (13 ms) [ RUN ] VulkanAPITest.add_broadcast1 [ OK ] VulkanAPITest.add_broadcast1 (9 ms) [ RUN ] VulkanAPITest.add_broadcast2 [ OK ] VulkanAPITest.add_broadcast2 (9 ms) [ RUN ] VulkanAPITest.add_ [ OK ] VulkanAPITest.add_ (60 ms) [ RUN ] VulkanAPITest.add_broadcast0_ [ OK ] VulkanAPITest.add_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.add_broadcast1_ [ OK ] VulkanAPITest.add_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.add_scalar [ OK ] VulkanAPITest.add_scalar (24 ms) [ RUN ] VulkanAPITest.add_scalar_ [ OK ] VulkanAPITest.add_scalar_ (8 ms) [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (22 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (12 ms) [ RUN ] VulkanAPITest.avg_pool2d [ OK ] VulkanAPITest.avg_pool2d (9 ms) [ RUN ] VulkanAPITest.clamp [ OK ] VulkanAPITest.clamp (92 ms) [ RUN ] VulkanAPITest.clamp_ [ OK ] VulkanAPITest.clamp_ (60 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (15 ms) [ RUN ] VulkanAPITest.conv2d_dw [ OK ] VulkanAPITest.conv2d_dw (15 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (34 ms) [ RUN ] VulkanAPITest.conv2d_winograd [ OK ] VulkanAPITest.conv2d_winograd (10 ms) [ RUN ] VulkanAPITest.copy [ OK ] VulkanAPITest.copy (1 ms) [ RUN ] VulkanAPITest.div [ OK ] VulkanAPITest.div (32 ms) [ RUN ] VulkanAPITest.div_broadcast0 [ OK ] VulkanAPITest.div_broadcast0 (11 ms) [ RUN ] VulkanAPITest.div_broadcast1 [ OK ] VulkanAPITest.div_broadcast1 (9 ms) [ RUN ] VulkanAPITest.div_broadcast2 [ OK ] VulkanAPITest.div_broadcast2 (7 ms) [ RUN ] VulkanAPITest.div_ [ OK ] VulkanAPITest.div_ (46 ms) [ RUN ] VulkanAPITest.div_broadcast0_ [ OK ] VulkanAPITest.div_broadcast0_ (9 ms) [ RUN ] VulkanAPITest.div_broadcast1_ [ OK ] VulkanAPITest.div_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.div_scalar [ OK ] VulkanAPITest.div_scalar (95 ms) [ RUN ] VulkanAPITest.div_scalar_ [ OK ] VulkanAPITest.div_scalar_ (18 ms) [ RUN ] VulkanAPITest.empty [ OK ] VulkanAPITest.empty (0 ms) [ RUN ] VulkanAPITest.hardsigmoid [ OK ] VulkanAPITest.hardsigmoid (76 ms) [ RUN ] VulkanAPITest.hardsigmoid_ [ OK ] VulkanAPITest.hardsigmoid_ (80 ms) [ RUN ] VulkanAPITest.hardshrink [ OK ] VulkanAPITest.hardshrink (630 ms) [ RUN ] VulkanAPITest.hardshrink_ [ OK ] VulkanAPITest.hardshrink_ (573 ms) [ RUN ] VulkanAPITest.leaky_relu [ OK ] VulkanAPITest.leaky_relu (271 ms) [ RUN ] VulkanAPITest.leaky_relu_ [ OK ] VulkanAPITest.leaky_relu_ (254 ms) [ RUN ] VulkanAPITest.hardswish [ OK ] VulkanAPITest.hardswish (83 ms) [ RUN ] VulkanAPITest.hardswish_ [ OK ] VulkanAPITest.hardswish_ (72 ms) [ RUN ] VulkanAPITest.max_pool2d [ OK ] VulkanAPITest.max_pool2d (16 ms) [ RUN ] VulkanAPITest.mean [ OK ] VulkanAPITest.mean (17 ms) [ RUN ] VulkanAPITest.mean2d [ OK ] VulkanAPITest.mean2d (20 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (12 ms) [ RUN ] VulkanAPITest.mul [ OK ] VulkanAPITest.mul (28 ms) [ RUN ] VulkanAPITest.mul_broadcast0 [ OK ] VulkanAPITest.mul_broadcast0 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast1 [ OK ] VulkanAPITest.mul_broadcast1 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast2 [ OK ] VulkanAPITest.mul_broadcast2 (9 ms) [ RUN ] VulkanAPITest.mul_ [ OK ] VulkanAPITest.mul_ (43 ms) [ RUN ] VulkanAPITest.mul_broadcast0_ [ OK ] VulkanAPITest.mul_broadcast0_ (8 ms) [ RUN ] VulkanAPITest.mul_broadcast1_ [ OK ] VulkanAPITest.mul_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.mul_scalar [ OK ] VulkanAPITest.mul_scalar (64 ms) [ RUN ] VulkanAPITest.mul_scalar_ [ OK ] VulkanAPITest.mul_scalar_ (17 ms) [ RUN ] VulkanAPITest.reflection_pad2d [ OK ] VulkanAPITest.reflection_pad2d (7 ms) [ RUN ] VulkanAPITest.reshape [ OK ] VulkanAPITest.reshape (73 ms) [ RUN ] VulkanAPITest.reshape_ [ OK ] VulkanAPITest.reshape_ (41 ms) [ RUN ] VulkanAPITest.sigmoid [ OK ] VulkanAPITest.sigmoid (81 ms) [ RUN ] VulkanAPITest.sigmoid_ [ OK ] VulkanAPITest.sigmoid_ (68 ms) [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (28 ms) [ RUN ] VulkanAPITest.log_softmax Max Diff allowed: 5.87862e-05 ../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.log_softmax (19 ms) [ RUN ] VulkanAPITest.tanh [ OK ] VulkanAPITest.tanh (63 ms) [ RUN ] VulkanAPITest.tanh_ [ OK ] VulkanAPITest.tanh_ (68 ms) [ RUN ] VulkanAPITest.sub [ OK ] VulkanAPITest.sub (28 ms) [ RUN ] VulkanAPITest.sub_broadcast0 [ OK ] VulkanAPITest.sub_broadcast0 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast1 [ OK ] VulkanAPITest.sub_broadcast1 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast2 [ OK ] VulkanAPITest.sub_broadcast2 (8 ms) [ RUN ] VulkanAPITest.sub_ [ OK ] VulkanAPITest.sub_ (43 ms) [ RUN ] VulkanAPITest.sub_broadcast0_ [ OK ] VulkanAPITest.sub_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.sub_broadcast1_ [ OK ] VulkanAPITest.sub_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.upsample_nearest2d [ OK ] VulkanAPITest.upsample_nearest2d (5 ms) [ RUN ] VulkanAPITest.mobilenetv2 [ OK ] VulkanAPITest.mobilenetv2 (82 ms) [----------] 69 tests from VulkanAPITest (3885 ms total) [----------] Global test environment tear-down [==========] 69 tests from 1 test suite ran. (3885 ms total) [ PASSED ] 68 tests. [ FAILED ] 1 test, listed below: [ FAILED ] VulkanAPITest.log_softmax 1 FAILED TEST ``` Differential Revision: D30925995 fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a	2021-09-14 19:35:05 -07:00
Kshiteej K	ff6b475d4a	[fix] don't expose unique_dim in torch (#63080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62793 This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing. ~~Not sure how to add a test for it.~~ Have tested it locally. We can add a test like following. Tested this locally, it fails currently but passes with the fix. ```python def test_wildcard_import(self): exec('from torch import *') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080 Reviewed By: gchanan Differential Revision: D30738711 Pulled By: zou3519 fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e	2021-09-14 18:19:17 -07:00
Michael Carilli	36cac2be4d	[CUDA graphs] moves memory sharing intro paragraph (#64996 ) Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2	2021-09-14 17:53:43 -07:00
Supriya Rao	36a0d97281	Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo Test Plan: revert-hammer Differential Revision: D30558877 (`382e008fbf`) Original commit changeset: 3e62ff24a935 fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf	2021-09-14 17:33:38 -07:00
Yi Wang	3d312b3b8e	[Model Averaging] Simplify PostLocalSGD Optimizer API (#64885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885 1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2) The parameters are read from local optimizer's `param_groups` instead of a separate input. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 137865867 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30888794 fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2	2021-09-14 16:37:14 -07:00
Heitor Schueroff	382e008fbf	Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30558877 Pulled By: heitorschueroff fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680	2021-09-14 16:18:09 -07:00
Erjia Guan	c65128679b	[DataPipe] Improve Mapper to accept input/output index when apply fn (#64951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30910035 Pulled By: ejguan fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0	2021-09-14 15:46:42 -07:00
Jerry Zhang	670853295a	[quant][tensorrt] Add tensorrt backend config (#64623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623 The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation Test Plan: python torch/fx/experimental/fx2trt/example/unittests.py Imported from OSS Reviewed By: vkuzo Differential Revision: D30800474 fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2	2021-09-14 15:27:33 -07:00
Scott Wolchok	85222c050f	[PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277 Just moved the vector implementation to ArrayRef and re-implemented the former using the latter. ghstack-source-id: 137978947 Test Plan: existing CI Reviewed By: dhruvbird Differential Revision: D30647666 fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f	2021-09-14 14:22:12 -07:00
Scott Wolchok	5d4efed83e	[PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110 As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp! ghstack-source-id: 137978946 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261 new mean is 292 ms, down from 302 ms. Reviewed By: dhruvbird Differential Revision: D30615052 fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17	2021-09-14 14:22:10 -07:00
Scott Wolchok	a9121df09c	[PyTorch] Remove implicit conversion from Tuple to vector reference (#63993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993 This seems to be unused, and it's pretty scary. ghstack-source-id: 137978949 Test Plan: CI Reviewed By: lw Differential Revision: D30560441 fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753	2021-09-14 14:22:08 -07:00
Scott Wolchok	452402b984	[PyTorch] Fix SourceRangeDeserializer vector copy (#64031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031 More copies of tuple elements. ghstack-source-id: 137978948 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293 Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph. Reviewed By: raziel Differential Revision: D30559545 fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea	2021-09-14 14:20:45 -07:00
Shiyan Deng	57eda69219	[fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004 If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype. This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created. Also, start to add doc string for functions. Reviewed By: yinghai Differential Revision: D30852156 fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4	2021-09-14 12:27:37 -07:00
Salil Desai	3727baea6f	[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269 Revert changes in D29826210 (`693d8f2f07`) (we don't need operator lambda caching since there aren't duplicate operators anymore) This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided. ghstack-source-id: 138014904 Test Plan: Speech Transducer v25 model (as in D29826210 (`693d8f2f07`)) \|\| Before \| After \| \|Load Time\|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)\|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)\| \|Save File Size\|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)\|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)\| The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before. Steps - Check out desired commit in devserver (base branch or this diff) - ```buck build bento/kernels:bento_kernel_pytorch``` - Use N1094068 with pytorch_local kernel to save model for lite interpreter - Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5 - ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` Test that saving a model with de-dup ops doesn't change its output https://www.internalfb.com/intern/anp/view/?id=1137434 Reviewed By: iseeyuan Differential Revision: D30615710 fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c	2021-09-14 12:12:46 -07:00
Salil Desai	86e6bed0d4	[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [1/2] (#64268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64268 If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one. ghstack-source-id: 138014905 Test Plan: Phabricator tests, and test performance changes in next diff Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D30615434 fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82	2021-09-14 12:11:32 -07:00
Eli Uriegas	97df69eac6	.github: Add render test results step (#64937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64937 Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so: ![fdsfdsfdsfdsf](https://user-images.githubusercontent.com/1700823/133156245-ba939cbf-8aa2-47a7-b1fb-7cc876ca75c4.png) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30917897 Pulled By: seemethere fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd	2021-09-14 11:25:14 -07:00
Natalia Gimelshein	d188204323	remove SkipInfo class (#64972 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/64972 Reviewed By: mruberry Differential Revision: D30924598 Pulled By: ngimel fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899	2021-09-14 11:23:54 -07:00
Scott Wolchok	eedc234e33	[PyTorch] Don't store multiple kernels per key on mobile (#64447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64447 As the code comment says, we needn't worry about Jupyter notebooks on mobile. ghstack-source-id: 137951718 Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down. Reviewed By: ezyang, bhosmer Differential Revision: D30736094 fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89	2021-09-14 10:36:43 -07:00
Shiyan Deng	446d95a7f6	[fx const fold] fix some cases with deep model hierarchy (#64945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64945 In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph. The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module. Reviewed By: jfix71 Differential Revision: D30905949 fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d	2021-09-14 09:45:44 -07:00
Yi Wang	00e6e0c593	[Model Averaging] Revert #63895 (#64903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64903 Fix the accuracy regression caused by https://github.com/pytorch/pytorch/pull/63895. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30894688 fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485	2021-09-14 09:45:42 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Nikita Shulga	01cfea9485	Disable target determination for now (#64921 ) Summary: There were several reports of target determinator incorrectly skipping tests, most recent one is https://github.com/pytorch/pytorch/issues/64902 Let's disable it until it could be further stabilized Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921 Reviewed By: seemethere, janeyx99 Differential Revision: D30901186 Pulled By: malfet fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde	2021-09-14 09:40:13 -07:00
Jane (Yuan) Xu	4e225da363	print_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948 ) Summary: Connected with issue https://github.com/pytorch/pytorch/issues/64845, takeover of https://github.com/pytorch/pytorch/issues/64091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64948 Reviewed By: malfet, seemethere Differential Revision: D30908592 Pulled By: janeyx99 fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146	2021-09-14 09:01:06 -07:00
Richard Zou	e884554008	Make {select,slice,diagonal}_backward primitives wrt autograd (#64933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64933 Fixes https://github.com/facebookresearch/functorch/issues/108 This is a short-term fix. A longer-term fix would be to either: 1. have proper {select,slice,diagonal}_embed functions 2. have efficient {select,slice,diagonal}_scatter functions (and efficient zero tensors). NB: I didn't use diag_embed because diag_embed is slightly different from diagonal_backward. There are no BC concerns because TorchScript (luckily) does not serialize the backwards graph. Test Plan: - run tests - run benchmarks. https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1. Surprisingly the instruction count goes down. This is probably because we create fewer autograd nodes now. Reviewed By: ezyang Differential Revision: D30909333 Pulled By: zou3519 fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c	2021-09-14 08:10:59 -07:00
Yukio Siraichi	2853c7da22	Replace composite dispatch with `CompositeExplicitAutograd` (#64641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64641 `sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711, respectively. Those PRs changed related overlads into composite kernels. However, their dispatch section remained the same, when they really should be marked as `CompositeExplicitAutograd`. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867122 Pulled By: ezyang fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a	2021-09-14 07:56:54 -07:00
Edward Yang	09d221e8d4	Revert D30711934: [pytorch][PR] Use RDS for build size tracking Test Plan: revert-hammer Differential Revision: D30711934 (`1cd0252eed`) Original commit changeset: 0af808ddf528 fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10	2021-09-14 06:10:03 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Mikhail Zolotukhin	199031c48e	[TensorExpr] PyBinds: improve QoL of pybind users. (#64886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64886 Bind methods for implicit conversions and constructors to avoid boilerplate code. Differential Revision: D30889193 D30889193 Test Plan: Imported from OSS Reviewed By: jbschlosser Pulled By: ZolotukhinM fbshipit-source-id: 137c0c98f7f1576e1bb97c8de8a900b28407a30e	2021-09-14 00:21:28 -07:00
Peter Bell	caaa6efc1a	Fix use of deprecated tensor.type() in SegmentReduce.cpp (#64151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64151 Reviewed By: mruberry Differential Revision: D30917268 Pulled By: ngimel fbshipit-source-id: 63427372b651ac495d48ef552eba5fbf0e4378e9	2021-09-13 23:16:47 -07:00
Supriya Rao	d4b4d83521	[quant] handle empty input in fused_moving_avg_obs_fake_quant op (#64829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64829 If an empty input is passed in, the aminmax operator fails with a runtime error like ``` RuntimeError: aminmax(): cannot compute aminmax over an empty dimension as the operation has no identity. ``` To avoid this during training we just return the input if we find it to be empty Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: jingsh Differential Revision: D30870879 fbshipit-source-id: 0cb4b187449a45a37150a77510d2292f93a7d1cd	2021-09-13 22:22:31 -07:00
Ivan Yashchuk	0aef44cb3d	Add forward AD for torch.linalg.eigh (#62163 ) Summary: This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass. For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163 Reviewed By: jbschlosser Differential Revision: D30903988 Pulled By: albanD fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823	2021-09-13 21:15:38 -07:00
Natalia Gimelshein	35c82dbf5c	[THC] remove TensorTypeUtils and TensorInfo (#64965 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/64965 Reviewed By: mruberry Differential Revision: D30916754 Pulled By: ngimel fbshipit-source-id: b24020d6a7ce8a05a5ab6c579d176dd94dd3b1d7	2021-09-13 20:36:28 -07:00
Xiang Gao	816048e7e6	EmbeddingBag sort thrust->cub (#64498 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/57505 Also fixes a warning I found when compiling: ``` /home/gaoxiang/pytorch-cub/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu(7): warning: inline qualifier ignored for "__global__" function ``` I also updated the bfloat16 guard to CUDA 11.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64498 Reviewed By: mruberry Differential Revision: D30917077 Pulled By: ngimel fbshipit-source-id: fb9df08fd469038478a563014b5af7452b4b28c0	2021-09-13 19:51:12 -07:00
Chiang, Yu-Hsun (oToToT)	ed30afd480	Speed up torch.unique_consecutive() (#64835 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62690 Like the way `unique_consecutive_cpu_template` implemented, this PR reimplements `_unique_dim_cpu_impl` to get better performance. Also, because the overhead of `unique_dim_consecutive_cpu` is quite large, directly call `unique_consecutive_cpu_template` when we know the given input is a 1d-array. ## Benchmark ### Script ```python import torch import time torch.manual_seed(0) t = torch.randint(500, (10000000, )) t = torch.sort(t)[0] start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=0) time:", end - start) start = time.time() uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive() time:", end - start) t = torch.randint(500, (10000000, 2)) t = torch.sort(t)[0] start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=0) time:", end - start) start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=1) time:", end - start) ``` ### Before ``` torch.unique_consecutive(dim=0) time: 78.64345622062683 torch.unique_consecutive() time: 0.029544353485107422 torch.unique_consecutive(dim=0) time: 91.49796152114868 torch.unique_consecutive(dim=1) time: 0.30872368812561035 ``` ### After ``` torch.unique_consecutive(dim=0) time: 0.08256125450134277 torch.unique_consecutive() time: 0.08162403106689453 torch.unique_consecutive(dim=0) time: 35.58408498764038 torch.unique_consecutive(dim=1) time: 1.6258199214935303 ``` ## System Information ``` Collecting environment information... PyTorch version: 1.10.0a0+git7f1932e Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.2 [pip3] torch==1.10.0a0+gitbe09195 [conda] Could not collect ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64835 Reviewed By: jbschlosser Differential Revision: D30894906 Pulled By: ngimel fbshipit-source-id: 42ab76d638391ce6c4e589d9c71bdf7579310ad9	2021-09-13 19:00:36 -07:00
Vitaly Fedyunin	ab5e1c69a7	[WIP] Example of DataPipes and DataFrames integration (#60840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60840 Test Plan: Imported from OSS Reviewed By: wenleix, ejguan Differential Revision: D29461080 Pulled By: VitalyFedyunin fbshipit-source-id: 4909394dcd39e97ee49b699fda542b311b7e0d82	2021-09-13 18:50:15 -07:00
driazati	ee554e2e96	Re-land Fix test report uploading (#64958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64958 This is a re-do of #64846 which was missing a path prefix for windows test reports Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D30915253 Pulled By: driazati fbshipit-source-id: d14d0a64d2f8aabc335db9c4d0d2b63512887c66	2021-09-13 18:36:26 -07:00
Tao Xu	f159f12fee	[iOS][OSS][BE] Add Simulator tests for full JIT (#64851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64851 ghstack-source-id: 137970229 Test Plan: CircleCI Reviewed By: hanton, cccclai Differential Revision: D30877963 fbshipit-source-id: 7bb8ade1959b85c3902ba9dc0660cdac8f558d64	2021-09-13 18:16:08 -07:00
Emad El-Haraty	fd09e564d6	add acc_ops.max, acc_ops.maximum, consolidate acc_ops.min and acc_ops.minimum Summary: This diff adds `acc_ops.max` and `acc_ops.maximum` support. It further consolidates the logic for `acc_ops.min` and `acc_ops.minimum` to match the logic for max. torch.max has three behaviors: ```1. max(input) 2. max(input, dim, keepdim=False, , out=None) 3. max(input, other, , out=None) ``` Likewise, `torch.min` has three identical behaviors. I've chosen to implement each as an acc_op, then map to the appropriate one. the third max function is effectively `torch.maximum`, so I've implemented it as that. Reviewed By: yinghai, jfix71, 842974287 Differential Revision: D30551464 fbshipit-source-id: 0a2eec10e5185cbf7d9984eec3fd399b23528b2a	2021-09-13 18:04:33 -07:00
CaoE	3855c24639	Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU (#62454 ) Summary: Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62454 Reviewed By: albanD Differential Revision: D30845805 Pulled By: heitorschueroff fbshipit-source-id: f83836862e38109ec929e83567133e9e88096b8b	2021-09-13 17:59:43 -07:00
David Riazati	1cd0252eed	Use RDS for build size tracking (#64303 ) Summary: This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis). It also hooks these up for build size tracking (which previously was not working on GHA) TODO: * verify output in logs + clean up prints Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303 Reviewed By: malfet, seemethere Differential Revision: D30711934 Pulled By: driazati fbshipit-source-id: 0af808ddf528a24875a378caeb1aa9cb0693f802	2021-09-13 17:48:44 -07:00
Nikita Shulga	c4073af61d	Add `skipIfTBB` decorator (#64942 ) Summary: And replace two existing usages in the codebase with it Pull Request resolved: https://github.com/pytorch/pytorch/pull/64942 Reviewed By: jbschlosser Differential Revision: D30906382 Pulled By: malfet fbshipit-source-id: e7f20f53aff734b0379eded361255543dab4fa4b	2021-09-13 17:11:51 -07:00
Victor Quach	8131bc85d0	Raise TypeError on assigned grad with wrong type (#64876 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64813 Raises a TypeError when assigned value to a grad is not a Tensor or None. Adds tests. cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876 Reviewed By: anjali411 Differential Revision: D30901678 Pulled By: soulitzer fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0	2021-09-13 16:41:45 -07:00
Natalia Gimelshein	1e25a84993	kill SkipInfo (#64878 ) Summary: Per offline discussion, replaces SkipInfo with DecorateInfo. SkipInfo class itself is not removed yet to give functorch time to replace its SkipInfos. cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64878 Reviewed By: mruberry Differential Revision: D30908052 Pulled By: ngimel fbshipit-source-id: 5124180b25c6e32517722883b9f3a2b488e3fe20	2021-09-13 16:32:36 -07:00
Shirong Wu	3710edc86b	Fix TRTOperatorSupport (#64873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64873 Fix TRTOperatorSupport's key naming to match the key generated by torch.fx.passes.tools_common.get_node_target. The get_node_target is used by splitter_base for comparing whether operator is supported by name. Test Plan: print out the supported operator dict and check name. Run TRTSplitter with lrm_split_model_generator and verify split result is correct with all supported operators printed. current split result: ```` Supported node types in the model: acc_ops.size: ((), {'input': torch.float32}) acc_ops.getitem: ((), {'input': torch.float32}) acc_ops.getitem: ((), {'input': None}) acc_ops.reshape: ((), {'input': torch.float32}) acc_ops.unsqueeze: ((), {'input': torch.float32}) acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32}) acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32}) acc_ops.mul: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.cat: ((), {}) acc_ops.add: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.add: ((), {'input': torch.float32}) acc_ops.tanh: ((), {'input': torch.float32}) acc_ops.transpose: ((), {'input': torch.float32}) acc_ops.matmul: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.div: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.squeeze: ((), {'input': torch.float32}) acc_ops.noop: ((), {'input': torch.float32}) acc_ops.layer_norm: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32}) acc_ops.permute: ((), {'input': torch.float32}) acc_ops.sigmoid: ((), {'input': torch.float32}) acc_ops.flatten: ((), {'input': torch.float32}) acc_ops.softmax: ((), {'input': torch.float32}) acc_ops.sum: ((), {'input': torch.float32}) Unsupported node types in the model: torch.ops.fb.pad_sequence_embeddings: ((), {'embeddings': torch.float32, 'offsets': torch.int32}) acc_ops.linalg_norm: ((), {'input': torch ``` Reviewed By: yinghai Differential Revision: D30884463 fbshipit-source-id: 22442aa6a69cd148ce9bc8be8f62157dd6d19954	2021-09-13 15:55:15 -07:00
Eli Uriegas	914e3a861a	Revert D30878101: [pytorch][PR] Fix test report uploading Test Plan: revert-hammer Differential Revision: D30878101 (`fba40bfc1a`) Original commit changeset: 0730f17fa3f4 fbshipit-source-id: dad89e68b4daf656dd0b592bc9b2758f00af38c6	2021-09-13 15:24:44 -07:00
Vasiliy Kuznetsov	6101cbcedb	torch.ao migration: fake_quantize.py, phase 1 (#64814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814 1. move the file ``` hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/ ``` 2. create a new file in the old location and copy the imports 3. fix all callsites inside `torch` Test Plan: ``` buck test mode/dev //caffe2/test:quantization ``` Reviewed By: z-a-f Differential Revision: D30866792 fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e	2021-09-13 15:22:28 -07:00
Scott Wolchok	e4314dac57	[PyTorch] Reduce heap allocations in OperatorName::setNamespaceIfNotSet (#64673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64673 We are now guaranteed to allocate at most one time in this function. ghstack-source-id: 137786392 Test Plan: Previous diff adds test coverage for this function. Reviewed By: dhruvbird Differential Revision: D30813014 fbshipit-source-id: 17d844a1cc8c30574afcc6b0b41b219e62c0b723	2021-09-13 14:33:55 -07:00
Scott Wolchok	000f3310d7	[PyTorch] Add test for operator_name (#64672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64672 Just a small struct missing test coverage. Next diff changes it. ghstack-source-id: 137786388 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30813013 fbshipit-source-id: 05f39494bb9512a71a928bfe6fcfa710016bdf61	2021-09-13 14:32:50 -07:00
Emad El-Haraty	c99277e177	handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None (#64869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64869 handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None Reviewed By: 842974287 Differential Revision: D30872739 fbshipit-source-id: 2755d3230804a16ef1c9289f804138c6dd7766b3	2021-09-13 14:24:16 -07:00
XiaobingSuper	0561e104d9	fix build error when system cmake3 version >=3.5 but <=3.10 (#64914 ) Summary: For PyTorch source build using conda, there will raise an error in `8535418a06/CMakeLists.txt (L1)` when we get a CMake version < 3.10, it can be fixed by upgrade CMake in conda env, but for centos, there has CMake3, PyTorch fist check whether CMake3's verison<=3.5, so if user's system camke<= 3.5, PyTorch will use the system's cmake3, which will have build error like: ``` CMake Error at CMakeLists.txt:1 (cmake_minimum_required): CMake 3.10 or higher is required. You are running version 3.6.3 -- Configuring incomplete, errors occurred! ``` we need to check CMake3 also >=3.10, if not, then check conda's CMake version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64914 Reviewed By: jbschlosser Differential Revision: D30901673 Pulled By: ezyang fbshipit-source-id: 064e2c5bc0b9331d6ecd65cd700e5a42c3403790	2021-09-13 13:26:06 -07:00
driazati	fba40bfc1a	Fix test report uploading (#64846 ) Summary: Previously we just weren't uploading Windows test report XML files to S3, only to GitHub actions. This was different than Linux where we use both (though maybe we can kill the GHA upload in a follow up PR since I don't think it's very useful anymore). This factors it all out into a macro so they both do the same thing. This also fixes the naming of uploaded files to include info about the job name (the full config, so they can be matched to the job visually or by the included job id). See https://hud.pytorch.org/pr/64846 for results Pull Request resolved: https://github.com/pytorch/pytorch/pull/64846 Reviewed By: seemethere Differential Revision: D30878101 Pulled By: driazati fbshipit-source-id: 0730f17fa3f46a32c131f52669084c3103b0e616	2021-09-13 13:22:54 -07:00
Nikita Shulga	af984c78a9	Pin SciPy to 1.6.3 on Mac (take 2) (#64922 ) Summary: It's already pinned by via docker install on Linux `scipy.stats.`[`poission`\|`geom`\|`binom`] returns quite different results between 1.6.x and 1.7+ versions of SciPy, which results in several distributions tests failing accuracy thresholds Reland of https://github.com/pytorch/pytorch/pull/64844 but limited to just Mac platform Followup PR for Windows are coming as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/64922 Reviewed By: janeyx99 Differential Revision: D30901257 Pulled By: malfet fbshipit-source-id: 0543e7bae9d3bbeb8b6be7b3ecf605880f97665f	2021-09-13 12:48:11 -07:00
Don Jang	1bea49c716	[Deploy] Avoid use-after-free during autograd shutdown (#64620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64620 `autograd` extension module's shutdown logic destructs `PyThreadState` by `pybind11::gil_scoped_acquire` using the RAII pattern. The problem is that torch.deploy also destructs `PyThreadState` as part of its shutdown process (https://www.internalfb.com/phabricator/paste/view/P456363738), causing double destruction, use-after-free. This change adds `defined(USE_DEPLOY)` as a special case to avoid destruction of `PyThreadState` to the existing special treatment for `IS_PYTHON_3_9_PLUS`. Test Plan: Added `TorchpyTest.Autograd` unittest to ensure that torch.deploy can create multiple instances that use autograd without causing a crash. Reviewed By: albanD Differential Revision: D30779080 fbshipit-source-id: 4de3283cc2d394acc9b8141c17cacbfab5eea052	2021-09-13 12:43:10 -07:00
Jacob Szwejbka	fd716fcda2	[Pytorch Edge] Quantized Ops Dtype Selective (#63680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63680 Quantized ops not covered by DType Selectivity. Add the check, and adjust call sites to be constexpr friendly. Test Plan: CI (this covers all model unit tests), verified that segmentation (a model that uses some of these quant ops) still works on instagram. Reviewed By: dhruvbird, raymondethan Differential Revision: D30457626 fbshipit-source-id: 5ba850d2b53a18558dfbb1cfaa78d8f53b5dbad8	2021-09-13 11:04:07 -07:00
Edward Yang	4ca40aeb83	Disable more of the pragma warning stuff (#64899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64899 ghstack-source-id: 137882055 Test Plan: sandcastle, ossci Reviewed By: malfet, ngimel Differential Revision: D30893691 fbshipit-source-id: 67ec8cc9f212aa16a201771603236e429944b561	2021-09-13 10:58:31 -07:00
Scott Wolchok	8cfc74400a	[PyTorch] Gate tls_local_dispatch_key_set off on iOS too (#64753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64753 This may possibly be causing problems on iOS. (Maybe we should just revert inlining access to this thing? Really don't understand what's wrong with it, though.) ghstack-source-id: 137830520 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D30826897 fbshipit-source-id: 0438dee9d49e7601c26cdca0e8540229c777eddb	2021-09-13 10:54:28 -07:00
VertexC	d4b031b31e	typo fix (#64615 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/64615 Reviewed By: jbschlosser Differential Revision: D30884298 Pulled By: ngimel fbshipit-source-id: 230f9d06aa85abcdd69828a1ea0a83f36cbfcb17	2021-09-13 10:50:01 -07:00
kshitij12345	01e92f2a56	[nn] no batch dim support: CosineEmbeddingLoss (#64590 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/64590 Reviewed By: H-Huang Differential Revision: D30900775 Pulled By: jbschlosser fbshipit-source-id: d24e72787017e79afbf8f04a94901a290485b81a	2021-09-13 10:45:33 -07:00
Rishi Puri	2ae938e15e	Fixes failure in test_dataloader.py that occurs on jetson boards (#64757 ) Summary: CUDA IPC is not supported for jetsons Pull Request resolved: https://github.com/pytorch/pytorch/pull/64757 Reviewed By: jbschlosser Differential Revision: D30900593 Pulled By: ejguan fbshipit-source-id: c6b2e8a9746276fdb4a009b6412e47cc8aac69f2	2021-09-13 10:11:04 -07:00
Jane Xu	8e63199c7c	.github: Always run chown workspace (#64854 ) Summary: In some workflow runs, like https://github.com/pytorch/pytorch/runs/3568714658, the chown workspace step is duplicated. Is that intentional? Unfortunately it is pretty necessary since (w/ docker) the folder can sometimes be in a broken permission state before and after we run jobs. So this PR makes the second chown workspace run always because that's the true intention of the step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64854 Reviewed By: jbschlosser, seemethere Differential Revision: D30879289 Pulled By: janeyx99 fbshipit-source-id: 4157ff826c86e8c912deb1ba0cb5c47ea7596529	2021-09-13 10:06:31 -07:00
Eli Uriegas	70e64feda7	Reland .circleci: Skip cuda /cudnn install if existing (#64880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64880 This reverts commit 5836a116d0de214d6d759e70671f23150a5deaba. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30885675 Pulled By: seemethere fbshipit-source-id: 8c96584d5a632170e29f91c5daf0206680a78661	2021-09-13 09:52:16 -07:00
Supriya Rao	3d976d9ceb	torch.ao migration: quantize_jit.py phase1 (#64860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64860 ghstack-source-id: 137885395 Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: jerryzh168 Differential Revision: D30880574 fbshipit-source-id: 9629027dd3b00bb8d45633e1564fc03a866f8c31	2021-09-13 08:41:48 -07:00
Supriya Rao	9d52651d4e	torch.ao migration: stubs.py phase 1 (#64861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64861 1. move the file ``` hg mv caffe2/torch/quantization/stubs.py caffe2/torch/ao/quantization/ ``` 2. create a new file in the old location and copy the imports 3. fix all call sites inside `torch` ghstack-source-id: 137885365 Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: jerryzh168 Differential Revision: D30879678 fbshipit-source-id: a2d24f25d01064212aca15e94e8c78240ba48953	2021-09-13 08:40:29 -07:00
Jiayi Sun	c08b2491cc	add BFloat16 operators on CPU: cummax, cummin (#63307 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63307 Reviewed By: nikithamalgifb Differential Revision: D30342002 Pulled By: anjali411 fbshipit-source-id: eee6e640da996ef0e983960119608d9c12405336	2021-09-13 08:00:17 -07:00
Xiaoyu Zhang	d932ddd24b	fix quantization.rst doc (#64802 ) Summary: RT。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64802 Reviewed By: jbschlosser Differential Revision: D30887210 Pulled By: vkuzo fbshipit-source-id: 0267883d3065d724ea654a28db78f5fe5702ef06	2021-09-13 07:19:54 -07:00
Eddie Ren	9c73a48ecf	ND Embeddings benchmark - Standardize randomized inputs (#64707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64707 Use torch.randn instead of torch.from_numpy to generate the tensor Test Plan: buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test Reviewed By: jingsh Differential Revision: D30817302 fbshipit-source-id: 924c05517812b4b9f7df05a8999f9236cfe7b672	2021-09-13 06:47:35 -07:00
Heitor Schueroff	b37503e452	Initial implementation of nanmean (#62671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62671 Very crude first implementation of `torch.nanmean`. The current reduction kernels do not have good support for implementing nan* variants. Rather than implementing new kernels for each nan* operator, I will work on new reduction kernels with support for a `nan_policy` flag and then I will port `nanmean` to use that. TODO - [x] Fix autograd issue Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30515181 Pulled By: heitorschueroff fbshipit-source-id: 303004ebd7ac9cf963dc4f8e2553eaded5f013f0	2021-09-13 05:53:58 -07:00
Heitor Schueroff	8535418a06	[Reland] Added reference tests to ReductionOpInfo (#64273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64273 Reintroduced sample_inputs_prod and constrained the range of values for large reference tests. This reverts commit e4fd2ab59ce8645f5ae9477c7724b6af82124b3b. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30672097 Pulled By: heitorschueroff fbshipit-source-id: b44ed8dfd5eb0c74c194164dafc3242f6728a78f	2021-09-12 20:05:43 -07:00
Emilio Castillo	1cb3507ed3	Adds DLPack support (#57110 ) Summary: Partially Fixes https://github.com/pytorch/pytorch/issues/55090 Depends on https://github.com/pytorch/pytorch/issues/55365 Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973 Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy? TODO: Add tests Would like some feedback as this design needs quite a few iterations rgommers leofang Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110 Reviewed By: saketh-are Differential Revision: D30761481 Pulled By: mruberry fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209	2021-09-12 19:47:15 -07:00
kshitij12345	d46ea03871	[fix] fix test_python_dispatch with pytest (#64574 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62501 Another approach for fixing the same issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64574 Reviewed By: ngimel Differential Revision: D30867237 Pulled By: ezyang fbshipit-source-id: c632a1e0b241effdc21ae929abe42fccec88aa24	2021-09-12 17:06:55 -07:00
Nikita Shulga	be79da3303	Revert D30876591: [pytorch][PR] Pin scipy to 1.6.3 on Windows and Mac Test Plan: revert-hammer Differential Revision: D30876591 (`39f2b9de2a`) Original commit changeset: 4946e0922063 fbshipit-source-id: b8beff3d973b21fe09c158baef25344030f8fb08	2021-09-12 15:56:40 -07:00
Vasiliy Kuznetsov	1577c106dc	torch.ao migration: numeric suite, eager and fx (#64817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64817 This migrates `torch.quantization._numeric_suite` to `torch.ao.ns._numeric_suite`, and `torch.quantization._numeric_suite_fx` to `torch.ao.ns._numeric_suite_fx`. 1. move the files ``` HG: move eager mode hg mv caffe2/torch/quantization/_numeric_suite.py caffe2/torch/ao/ns/ HG: move fx hg mv caffe2/torch/quantization/_numeric_suite_fx.py caffe2/torch/ao/ns/ hg mv caffe2/torch/quantization/ns/* caffe2/torch/ao/ns/fx/ ``` 2. create new versions of `_numeric_suite.py` and `_numeric_suite_fx.py` with imports 3. update all FB callsites Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: z-a-f Differential Revision: D30867538 fbshipit-source-id: 120ee830434ca490c1183a187a518eebcbbaf22c	2021-09-12 12:00:45 -07:00
Nikita Shulga	39f2b9de2a	Pin scipy to 1.6.3 on Windows and Mac (#64844 ) Summary: It's already pinned by via docker install on Linux As `scipy.stats.`[`poission`\|`geom`\|`binom`] returns quite different results in 1.7+ versions of SciPy Pull Request resolved: https://github.com/pytorch/pytorch/pull/64844 Reviewed By: driazati Differential Revision: D30876591 Pulled By: malfet fbshipit-source-id: 4946e0922063e9ac320c218a0b089f73486466f7	2021-09-12 10:53:48 -07:00
Nikita Shulga	47144de473	Revert D30867266: [pytorch][PR] TST Adds gradcheck and gradgradcheck to module info Test Plan: revert-hammer Differential Revision: D30867266 (`67ebde5645`) Original commit changeset: cbc073326151 fbshipit-source-id: 00234e01eafc45fb999f7c83a397f9d6b3e01e46	2021-09-12 10:30:28 -07:00
Martin Yuan	30a7c768d7	[RFC] Modularize functions of parsing bytecode (#61862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862 Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter. * The decoupled functions are re-used by current lite interpreter loader. * The bytecode can be serialized/deserialized from other formats. * The decoupled functions have minimum dependencies on other PyTorch components. Next: Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components. ghstack-source-id: 137867287 Test Plan: As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction). CI Reviewed By: larryliu0820 Differential Revision: D29798382 Pulled By: iseeyuan fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f	2021-09-11 22:24:05 -07:00
Natalia Gimelshein	dd2d48df07	Revert D30875977: [caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h Test Plan: revert-hammer Differential Revision: D30875977 (`1f35d20a89`) Original commit changeset: bd593feb5a75 fbshipit-source-id: 4c82dbc857fdb28e0240dacc1a0e607a76552bb4	2021-09-11 17:18:37 -07:00
Tao Xu	d13e0c9c39	[iOS][OSS][BE] Update XCode to use 12.5.1 (#64850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64850 ghstack-source-id: 137827895 Test Plan: CircleCI Reviewed By: hanton Differential Revision: D30877964 fbshipit-source-id: 803f2506a755b3815024704e6177c7826bc42de8	2021-09-11 11:24:06 -07:00
Tao Xu	c9eb312ce9	[iOS][OSS][BE] Remove unused files (#64849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64849 ghstack-source-id: 137827893 Test Plan: CircleCI Reviewed By: hanton Differential Revision: D30877962 fbshipit-source-id: a76f7fe888b990ba6cad650f72be7f4a1e58a2f1	2021-09-11 11:22:55 -07:00
Mikhail Zolotukhin	82ac3f108d	[TensorExpr] Move 2 graph passes from kernel.cpp to graph_opt.cpp (#64828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828 Also, make `removeUnusedSelfArgument` more consistent with other passes by mutating the graph in-place rather than returning a copy. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30870776 Pulled By: ZolotukhinM fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9	2021-09-11 10:23:15 -07:00
Mikhail Zolotukhin	ff65f637df	[TensorExpr] Add debug logging (store/load tracing) to IREval. (#64848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64848 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D30878278 Pulled By: ZolotukhinM fbshipit-source-id: bd946075336ba2e9786602161c236a0ff8a5a011	2021-09-11 09:25:55 -07:00
Mikhail Zolotukhin	180e4fbfae	[TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862 Previously we erroneously were looking at dst signedness. This was discovered when we tried to implement quantize/dequantize ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30881696 Pulled By: ZolotukhinM fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f	2021-09-11 09:24:36 -07:00
Elias Guestrin	1f35d20a89	[caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h (#64870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64870 Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h Issue started with D30728580 (`d701357d92`), was fixed with D30846958 (`40098f48a1`), and brought back again with the reversion of D30846958 (`40098f48a1`). Reviewed By: H-Huang Differential Revision: D30875977 fbshipit-source-id: bd593feb5a75245470e43ad568ebdd3f1738da7c	2021-09-11 00:43:19 -07:00
Jerry Zhang	d4a86c1f3b	[quant][fx2trt] Add lowering support for reference linear/conv modules (#64368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64368 Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: 842974287 Differential Revision: D30708738 fbshipit-source-id: 88142b7ce43ed96093597112dab03a2d277de993	2021-09-10 22:25:27 -07:00
Hui Guo	4481c87ac4	[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763 Simplification pattern: x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30845854 Pulled By: huiguoo fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd	2021-09-10 20:33:02 -07:00
Eli Uriegas	5836a116d0	Revert D30869803: .circleci: Skip cuda /cudnn install if existing Test Plan: revert-hammer Differential Revision: D30869803 (`717d267e19`) Original commit changeset: 9eb3bd20875d fbshipit-source-id: bef8d0c693696307a3be7abd5331b7fa813d754a	2021-09-10 18:56:50 -07:00
Thomas J. Fan	67ebde5645	TST Adds gradcheck and gradgradcheck to module info (#64444 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444 Reviewed By: ngimel Differential Revision: D30867266 Pulled By: jbschlosser fbshipit-source-id: cbc0733261517dbfcdd3415d969b9e802b62b7ac	2021-09-10 16:53:11 -07:00
Ansley Ussery	c60075d4b5	Preserve types during empty container assignment (#58911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #58911 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30785623 Pulled By: ansley fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090	2021-09-10 16:49:21 -07:00
Jane Xu	b4855619d1	Always upload stats to S3 (#64853 ) Summary: It's not very useful when stats are only uploaded when the tests all pass. Like for this failing run, the stats were not uploaded to Scribe or S3: https://github.com/pytorch/pytorch/runs/3568714658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64853 Reviewed By: seemethere Differential Revision: D30878361 Pulled By: janeyx99 fbshipit-source-id: 19a4c520efdd5575785a3ffbc60e6c09456b9e0d	2021-09-10 16:49:19 -07:00
Kevin Tse	f3f410880a	[DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/64788 * __->__ https://github.com/pytorch/pytorch/issues/64786 This PR removes ZipArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes issues related to unclosed buffer stream (see https://github.com/pytorch/pytorch/issues/64281). cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64786 Reviewed By: ngimel Differential Revision: D30870968 Pulled By: NivekT fbshipit-source-id: 64b04d1697b99772f2fa20fc141668e6b8e18c41	2021-09-10 16:49:17 -07:00
Eli Uriegas	717d267e19	.circleci: Skip cuda /cudnn install if existing (#64825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64825 Rewrites this script to only install the CUDA tools if they are not already pre-installed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30869803 Pulled By: seemethere fbshipit-source-id: 9eb3bd20875df0f2b18f5314ac825dbdf91637b5	2021-09-10 16:49:14 -07:00
Ilqar Ramazanli	dafa0a5a3b	[doc][hackathon] To add Adadelta Optimizer to the documentation (#63255 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdaDelta Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1212.5701 <img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255 Reviewed By: ngimel Differential Revision: D30867589 Pulled By: iramazanli fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc	2021-09-10 16:49:12 -07:00
Alban Desmaison	d8ae3cc318	Add more error checking in subclass creation (#64746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746 This extracts the error checking that used to be in the PR above. We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867569 Pulled By: albanD fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56	2021-09-10 16:49:10 -07:00
Alban Desmaison	89f94fc15f	Move THPVariable_NewWithVar around (#64550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64550 Just moves a function around to make the next PR easier to read. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867570 Pulled By: albanD fbshipit-source-id: 99ae925568ed29ca7fdea059762c21d430d4a204	2021-09-10 16:49:08 -07:00
Raghavan Raman	2cc9778495	[MicroBench] Added a log_vml version of the signed log1p kernel (#64205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205 The log_vml version of the micro-bench is over 2x faster than the log1p version. Here are the perf numbers: ``` --------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------- SignedLog1pBench/ATen/10/1467 45915 ns 45908 ns 14506 GB/s=2.5564G/s SignedLog1pBench/NNC/10/1467 40469 ns 40466 ns 17367 GB/s=2.9002G/s SignedLog1pBench/NNCLogVml/10/1467 19560 ns 19559 ns 35902 GB/s=6.00016G/s ``` Thanks to bertmaher for pointing this out. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30644716 Pulled By: navahgar fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19	2021-09-10 16:49:06 -07:00
Raghavan Raman	cad7a4b0ea	[nnc] Added an implementation of sign op (#64033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30579197 Pulled By: navahgar fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3	2021-09-10 16:49:04 -07:00
Eddie Ren	3fbb49e75d	Extend 2Dim embedding bag benchmarking to include 3Dim benchmarks (#64647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64647 Add support for benchmarking of 8 bit quantizations of N-D batched embeddings. Currently only works for 3Dim embeddings and still requires thought on ramping up from 3Dim to NDim. Test Plan: ```buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test``` Reviewed By: jingsh Differential Revision: D30770085 fbshipit-source-id: 26659020f3458991592065a05366bde0f060494e	2021-09-10 16:49:02 -07:00
Howard Huang	227aafd1d9	Revert D30846958: [caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h Test Plan: revert-hammer Differential Revision: D30846958 (`40098f48a1`) Original commit changeset: 52a3fb66e426 fbshipit-source-id: 1d749f6981756f2169d6867538555a945cbb8ca6	2021-09-10 16:47:08 -07:00
Kevin Tse	5060b69d62	[DataPipe] fixing tests related fork() to remove warnings (#64827 ) Summary: There are two warnings produced by `test_fork_datapipe`. This PR addresses the issues raised by those warnings without impacting the test cases. cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64827 Reviewed By: ejguan Differential Revision: D30870528 Pulled By: NivekT fbshipit-source-id: 580a001c6fa3ff6f8b04a7e5183e58861938204b	2021-09-10 11:01:42 -07:00
Hui Guo	ade4bf3e82	[tensorexpr] Add 'pre_alloc' argument in python API of tensorexpr kernel (#64718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64718 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30826582 Pulled By: huiguoo fbshipit-source-id: 6c173c8964f2643039273cdc83e64fb02bb5f381	2021-09-10 10:03:00 -07:00
anjali411	92cd5ab1cb	Skip conjugate and negate fallback for view ops and their in-place versions (#64392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64392 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30866330 Pulled By: anjali411 fbshipit-source-id: 7b2f51486bf1d610ad2b1472306bab608ee69c37	2021-09-10 09:57:27 -07:00
Ilqar Ramazanli	54b72a99ef	To add Rprop documentation (#63866 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rprop to the documentation. For more details, we refer to the paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417 <img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866 Reviewed By: ngimel Differential Revision: D30867590 Pulled By: iramazanli fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a	2021-09-10 09:49:10 -07:00
Jeff Daily	c7b03e2b83	[ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302 ) Summary: warpSize is defined as a constexpr in HIP headers. It is incorrect to assume warpSize 64. This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14). cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302 Reviewed By: mrshenli Differential Revision: D30785975 Pulled By: malfet fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7	2021-09-10 09:43:47 -07:00
Corey Levinson	db3fcf0af3	fix typo in torch/onnx/utils.py (#63396 ) Summary: fixes minor typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63396 Reviewed By: pbelevich Differential Revision: D30644295 Pulled By: SplitInfinity fbshipit-source-id: c506f67383909aa2c0c7c533038446b4b3d76a3a	2021-09-10 09:37:44 -07:00
rui	c12df2dc23	build: bump bazel to 4.2.1 (#64455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64455 Reviewed By: saketh-are Differential Revision: D30752580 Pulled By: malfet fbshipit-source-id: 4f5cc6f820396348181c09463f7e5628b5f69471	2021-09-10 08:30:10 -07:00
Aswin John Mathews	63b180beed	ROCm MIOpen NHWC Convolution support (#63617 ) Summary: - Added 2D-Convolution NHWC support - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag - May need to force MIOpen to search for solutions ( see examples below for flags ) PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified. Examples 1. Example usage 1 : Run test on ROCm4.3 `PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" ` 2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3. ``` #!/usr/bin/env python3 import torch model = torch.nn.Conv2d(8, 4, 3).cuda().half() model = model.to(memory_format=torch.channels_last) input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True) input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16) # should print True for is_contiguous(channels_last), and strides must match NHWC format print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() ) out = model(input) # should print True for is_contiguous(channels_last), and strides must match NHWC format print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :", out.shape, "out stride :", out.stride() ) ``` See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617 Reviewed By: saketh-are Differential Revision: D30730800 Pulled By: ezyang fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238	2021-09-10 08:06:32 -07:00
Shen Li	2a81e8b8f1	Let all_reduce_coalesced and all_gather_coalesced return Future objects (#64722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64722 `all_reduce_coalesced` and `all_gather_coalesced` are never publicly released in our API docs. So, I would assume the blast radius to be small. The motivation for this change to allow implementing `all_reduce_coalesced` and `all_gather_coalesced` by re-using `allreduce` and `allgather` C++ cores and perform flatten and copy only on the Python side. With that, we can then remove `all_reduce_coalesced` and `all_gather_coalesced` from C++ ProcessGroup APIs. For the async mode, the copy-back logic after the communication will need to be chained as a callback on the returned Future and use the chained child Future as the return value (otherwise, we will need to wrap the child Future into another work handle). This PR tries to test if we can directly return a Future without breaking tests and internal use cases. If yes, it will make the consolidation a lot easier. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30830994 Pulled By: mrshenli fbshipit-source-id: dcde0ed9245e9e8fee357b3588b07d540a4b6318	2021-09-10 07:45:25 -07:00
Nikita Vedeneev	88fff22023	`torch.lu`: forward AD support (#64742 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742 Reviewed By: H-Huang Differential Revision: D30841227 Pulled By: albanD fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a	2021-09-10 07:19:11 -07:00
Jordan Fix	be091950d0	[const_fold] Keep around node.meta for replaced folded ops (#64782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64782 Previously, get_attrs that were added to the graph did not retain node.meta after folding. Add such support, and improve coverage in general here. Test Plan: Added test coverage. Reviewed By: protonu Differential Revision: D30852704 fbshipit-source-id: ece87a61c69b2e68982964c6adc4dde14dae12c7	2021-09-09 23:52:39 -07:00
Elias Guestrin	40098f48a1	[caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h (#64773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64773 Remove loose `#pragma warning ( pop )` in TensorBase.h. Reviewed By: ezyang Differential Revision: D30846958 fbshipit-source-id: 52a3fb66e426bc16ef7bde2a13e26e8293969026	2021-09-09 23:45:45 -07:00
Shirong Wu	95d98dfeec	Add TRTSplitter (#64762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64762 Extract and format TRTSplitter from fx2trt_example code, current implementation is tentative, subject to changed based on feeds model lowering progress. Test Plan: manul print of supported operator: `{<class 'torch.nn.modules.activation.ReLU'>: None, <function relu at 0x7f9b1abd0790>: None, <class 'torch.nn.modules.activation.Sigmoid'>: None, <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>: None, <built-in method add of type object at 0x7f9b7f402498>: None, <built-in function add>: None, <built-in method add of PyCapsule object at 0x7f9b1a3dc690>: None, <built-in method add_relu of PyCapsule object at 0x7f9b1a34cf90>: None, <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.quantized.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.modules.conv.Conv2d'>: None, <class 'torch.nn.quantized.modules.conv.Conv2d'>: None, <class 'torch.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>: None, <class 'torch.nn.modules.linear.Linear'>: None, <class 'torch.nn.quantized.modules.linear.Linear'>: None, <class 'torch.nn.modules.pooling.MaxPool2d'>: None, <built-in function mul>: None, <built-in method mul of type object at 0x7f9b7f402498>: None, <built-in method mul of PyCapsule object at 0x7f9b1a3dc6c0>: None, <built-in method flatten of type object at 0x7f9b7f402498>: None, <class 'torch.nn.quantized.modules.DeQuantize'>: None, <built-in method dequantize of type object at 0x7f9b7f402498>: None, 'dequantize': None, <class 'torch.nn.quantized.modules.Quantize'>: None, <built-in method quantize_per_tensor of type object at 0x7f9b7f402498>: None, <class 'torch.nn.modules.linear.Identity'>: None, <function conv2d at 0x7f9b1a1fe9d0>: None, <function flatten at 0x7f9b1a1f5ca0>: None, <function size at 0x7f9b1a1f5b80>: None, <function batch_norm at 0x7f9b1a1feaf0>: None, <function layer_norm at 0x7f9b1a1feb80>: None, <function softmax at 0x7f9b1a1f9550>: None, <function relu at 0x7f9b1a1fe040>: None, <function sin at 0x7f9b1a2030d0>: None, <function cos at 0x7f9b1a203160>: None, <function tan at 0x7f9b1a2031f0>: None, <function sinh at 0x7f9b1a1fe160>: None, <function cosh at 0x7f9b1a1fe280>: None, <function tanh at 0x7f9b1a1fe310>: None, <function asin at 0x7f9b1a1fe3a0>: None, <function acos at 0x7f9b1a1fe430>: None, <function atan at 0x7f9b1a1fe4c0>: None, <function exp at 0x7f9b1a1fe550>: None, <function log at 0x7f9b1a1fe5e0>: None, <function sqrt at 0x7f9b1a1fe670>: None, <function reciprocal at 0x7f9b1a1fe700>: None, <function abs at 0x7f9b1a1fe790>: None, <function neg at 0x7f9b1a1fe820>: None, <function floor at 0x7f9b1a1fe8b0>: None, <function ceil at 0x7f9b1a1fe940>: None, <function sum at 0x7f9b1a1f9c10>: None, <function max_pool2d at 0x7f9b1a1f5d30>: None, <function squeeze at 0x7f9b1a1f5c10>: None, <function add at 0x7f9b1a1f91f0>: None, <function sub at 0x7f9b1a1f9ca0>: None, <function div at 0x7f9b1a1f9dc0>: None, <function mul at 0x7f9b1a1f9d30>: None, <function pow at 0x7f9b1a1f9e50>: None, <function min_two_tensors_input at 0x7f9b1a1f9940>: None, <function unsqueeze at 0x7f9b1a1f9280>: None, <function topk at 0x7f9b1a203280>: None, <function adaptive_avg_pool2d at 0x7f9b1a1f5dc0>: None, <function avg_pool2d at 0x7f9b1a1f5e50>: None, <function reshape at 0x7f9b1a203550>: None, <function slice_tensor at 0x7f9b1a1fee50>: None, <function split at 0x7f9b1a1fec10>: None, <function linear at 0x7f9b1a1f51f0>: None, <function clamp at 0x7f9b1a1f93a0>: None, <function tuple_construct at 0x7f9b1a1fed30>: None, <function contiguous at 0x7f9b1a1f9430>: None, <function getitem at 0x7f9b1a203310>: None, <function cat at 0x7f9b1a1f9310>: None, <function transpose at 0x7f9b1a1f94c0>: None, <function matmul at 0x7f9b1a1f98b0>: None, <function sigmoid at 0x7f9b1a1fe1f0>: None, <function permute at 0x7f9b1a1f9670>: None, <function quantize_per_tensor at 0x7f9b1a1f9b80>: None, <function dequantize at 0x7f9b1a1f99d0>: None, <function sign at 0x7f9b1a1f5ee0>: None}` Reviewed By: 842974287 Differential Revision: D30798047 fbshipit-source-id: 69076a550874425b7186fbbf2ecf03da4a99b42f	2021-09-09 21:08:57 -07:00
Scott Wolchok	88c0ea9131	[PyTorch] Fix missing move in torch::jit::Lexer::next (#64653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64653 Saves shared_ptr refcount inc/dec in SourceRange. ghstack-source-id: 137608457 Test Plan: Profiled startup of framework overheads benchmark from high_per_models; self time spent in next() is way down. Reviewed By: dhruvbird Differential Revision: D30739240 fbshipit-source-id: ac455678c9d46e657b111d3788d4369983028674	2021-09-09 19:01:07 -07:00
Scott Wolchok	b7b4f63bbc	[PyTorch] Use std::find in the JIT lexer (#64652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64652 If nothing else, it is slightly clearer code. ghstack-source-id: 137608456 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30739239 fbshipit-source-id: bc7917b59883ca4a33fc6916b4e422bad79cf04b	2021-09-09 18:59:27 -07:00
Mikhail Zolotukhin	a17d6c7f80	[TensorExpr] Simplify TE IR before applying any transformations. (#64717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717 This also exposed several bugs, which are fixed in this PR. Differential Revision: D30826408 D30826408 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560	2021-09-09 18:50:51 -07:00
Jerry Zhang	ef2c9d7d8a	[quant][fix] Fix quantization for sub_scalar (#64603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64603 We'll insert observer only when both the operator and dtype is supported Test Plan: python test/test_quantization.py TestQuantizeFx.test_sub_scalar Imported from OSS Reviewed By: vkuzo Differential Revision: D30797025 fbshipit-source-id: a77c21e2749405534fc245374cf33a0657a3d2c8	2021-09-09 17:18:31 -07:00
Linbin Yu	1b5b210f2c	[Android] print type name for IValues (#64602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64602 print type name in error message for easier debugging. Test Plan: Example: java.lang.IllegalStateException: Expected IValue type Tensor, actual type TensorList Reviewed By: beback4u Differential Revision: D30782318 fbshipit-source-id: 60d88a659e7b4bb2b574b12c7652a28f0d5ad0d2	2021-09-09 17:06:15 -07:00
Xinyi Zhang	11ef68938c	[caffe2][tiny] Add logging to report what the current lengths are when mismatched lengths are detected (#64768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64768 as title Differential Revision: D30846637 fbshipit-source-id: 266768c81b315fdebba854135ea2db1faf67fd6a	2021-09-09 16:46:55 -07:00
Ilqar Ramazanli	d4b09dbab3	[doc][hackathon] To add Adagrad Optimizer to the documentation (#63254 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adagrad to the documentation. For more details, we refer to the paper http://jmlr.org/papers/v12/duchi11a.html <img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254 Reviewed By: albanD Differential Revision: D30852139 Pulled By: iramazanli fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66	2021-09-09 15:41:29 -07:00
Harut Movsisyan	9ad75281f6	[Static Runtime] Fix resize_output_check warning coming from prim::VarConcat (#64765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64765 Test Plan: Tested the fix with BR v1 model predictor-replayer setup. Reviewed By: ajyu Differential Revision: D30846506 fbshipit-source-id: 3ef3c93f11285c7cd1e2b188ca298a7ab4fba579	2021-09-09 14:38:50 -07:00
Han Guangyun	7f1932e1b9	Rename profiler metadata key (#63743 ) Summary: rename metadata key to be the same with variable name Pull Request resolved: https://github.com/pytorch/pytorch/pull/63743 Reviewed By: albanD Differential Revision: D30839501 Pulled By: gdankel fbshipit-source-id: b9b4e670dcc9557b8d8d0730baea0ad39a1a0ca4	2021-09-09 13:06:16 -07:00
Jordan Fix	6cc8cc6e56	Add support for lowering info during serialize_module, and add padding/partial to it (#5810 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64725 - Any info added to the dict in node.meta["lowering_info"] will be added to the node_rep during serialization. - Use this to add annotations on placeholders that allow partial inputs and require padding. - Check for these annotations and set them in the NNPICompiledFunction as expected Test Plan: Validated working on inline_cvr in stack. Additionally existing fx_glow end to end tests should still pass. Reviewed By: 842974287 Differential Revision: D30824192 fbshipit-source-id: def64ef097aa35c337abb494415f7d437c6c7fa9	2021-09-09 13:01:28 -07:00
Palwisha Akhtar	d43fb75a21	cat_shape_check: Fixes dimension in the error message for CUDA cat shape check and removes unnecessary offending index information (#64556 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/64207 Thank you, SsnL for providing the reproducing script. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64556 Reviewed By: albanD Differential Revision: D30843859 Pulled By: ngimel fbshipit-source-id: 457ebe80eaef793d9f5d35ee962b6697e5de1907	2021-09-09 12:51:11 -07:00
Xu Zhao	2c243ed112	Enable the on-demand performance PR testing to run on a specified TB branch (#64701 ) Summary: This is to enable performance testing of experimental features such as LazyTensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64701 Test Plan: TorchBench CI RUN_TORCHBENCH: BERT_pytorch, mobilenet_v3_large TORCHBENCH_BRANCH: v1.0 Reviewed By: seemethere Differential Revision: D30847389 Pulled By: xuzhao9 fbshipit-source-id: 6853b368fa6f1ba8ffde517805c74bf318dcb35b	2021-09-09 12:41:21 -07:00
Eli Uriegas	517033916c	.github: Remove add_annotations workflow (#64449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64449 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo, janeyx99 Differential Revision: D30738460 Pulled By: seemethere fbshipit-source-id: f1259fcba2f0c14a9bcfbe811ec0a4bf61106619	2021-09-09 12:22:12 -07:00
Rohan Varma	9797a32faf	[Dist/CI] Remove dist from target determinator (#64721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64721 There are a couple PRs where distributed CI did not runa nd we expect it to. Examples: https://github.com/pytorch/pytorch/pull/64513/checks?check_run_id=3539190960, https://github.com/pytorch/pytorch/pull/64113. All distributed tests should've been run on these PRs, but we can see they were not: ``` Determination is skipping distributed/test_c10d_common Determination is skipping distributed/test_c10d_gloo Determination is skipping distributed/test_c10d_nccl Determination is skipping distributed/test_c10d_spawn_gloo Determination is skipping distributed/test_c10d_spawn_nccl Running distributed/test_data_parallel without determination Determination is skipping distributed/test_distributed_spawn Determination is skipping distributed/test_jit_c10d ``` Since it is important to run distributed tests on PRs that touch distributed, exclude distributed from target_det_list for now. ghstack-source-id: 137654015 Test Plan: CI Reviewed By: driazati, mrshenli Differential Revision: D30830455 fbshipit-source-id: 8b0fdf5b57c2c647b0d82c48e2bb8e2bdbe4d307	2021-09-09 12:07:43 -07:00
Emad El-Haraty	46c886e8a6	fix acc topk's handling of the case when dim=0, fix tests as well (#64727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64727 the acc ops convertor for topk has a subtle bug (i found this while trying to introduce max/min) the code does not differentiate between dim == None and dim ==0, but these are both different computations Reviewed By: jfix71, 842974287 Differential Revision: D30833621 fbshipit-source-id: 6cd84e6ca4e95bb1a6d6465e61830b76808a9c78	2021-09-09 10:35:23 -07:00
Richard Barnes	3d3ff4a9e7	Fix a shadowed variable (#64695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64695 Resolves this warning: ``` caffe2/aten/src/ATen/ParallelOpenMP.h:109:63: warning: declaration of 'int64_t begin' shadows a parameter [-Wshadow=compatible-local] 109 \| internal::invoke_parallel(begin, end, grain_size, [&](int64_t begin, int64_t end) { \| ~~~~~~~~^~~~~ caffe2/aten/src/ATen/ParallelOpenMP.h:86:1: note: shadowed declaration is here 85 \| inline scalar_t parallel_reduce( \| ~~~~~~~~~~~~~~~~ 86 \| const int64_t begin, \| ^ ~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30816128 fbshipit-source-id: 3adff6d94eea9fbd65885e88283cae10b87dba18	2021-09-09 10:34:01 -07:00
Nived P A	8deaa476ac	Added more version comparison operations (#63848 ) Summary: Currently the [TorchVersion](`1022443168/torch/torch_version.py (L13)`) only only supports 'greater than', and 'equal to' operations for comparing torch versions and something like `TorchVersion('1.5.0') < (1,5,1)` or `TorchVersion('1.5.0') >= (1,5)` will throw an error. I have added 'less than' (`__lt__()`), 'greater than or equal to' (`__ge__()`) and 'less than or equal to' (`__le__()`) operations, so that the TorchVersion object can be useful for wider range of version comparisons. cc seemethere zsol Pull Request resolved: https://github.com/pytorch/pytorch/pull/63848 Reviewed By: fmassa, heitorschueroff Differential Revision: D30526996 Pulled By: seemethere fbshipit-source-id: 1db6bee555043e0719fd541cec27810852590940	2021-09-09 10:30:20 -07:00
Mike Ruberry	cfa6162e5e	Reverts cat and stack warning when out= is not the expected shape (#64714 ) Summary: These warnings are being thrown too aggressively at the moment. See https://github.com/pytorch/pytorch/issues/64709 for a follow-up to reenable them once internal call sites are reviewed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64714 Reviewed By: ngimel Differential Revision: D30822965 Pulled By: mruberry fbshipit-source-id: 3ad7c92d381d42ac6187ed84afab477c579a8f35	2021-09-09 10:03:22 -07:00
Ilqar Ramazanli	2b41bf40c5	To add SequentialLR to PyTorch Core Schedulers (#64037 ) Summary: Partially resolves https://github.com/pytorch/vision/issues/4281 In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process. The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster. With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler. ```PyThon scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2) scheduler2 = ExponentialLR(optimizer, gamma=0.9) scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5]) for epoch in range(100): train(...) validate(...) scheduler.step() ``` which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs. This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch. We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037 Reviewed By: albanD Differential Revision: D30841099 Pulled By: iramazanli fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751	2021-09-09 09:36:32 -07:00
John Shen	c3203efe80	[pytorch] Make qlinear weight packing thread safe (#63804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63804 Adding a lock around weight packing section of qlinear + qlinear_dynamic Test Plan: automated tests Reviewed By: kimishpatel Differential Revision: D30340957 fbshipit-source-id: 1c9faf796c4ffbc74345396188a6f1154a76bea6	2021-09-09 09:31:48 -07:00
Nikita Vedeneev	dc53546655	`torch.lu_solve`: forward AD support (#64646 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64646 Reviewed By: VitalyFedyunin Differential Revision: D30807898 Pulled By: albanD fbshipit-source-id: 1f943c22357dd1b3662cfe0d2a26af68e3a2df4c	2021-09-09 08:58:00 -07:00
Raghavan Raman	b7c86365d1	[nnc] Handled cast in index expression during inlining (#64716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30826388 Pulled By: navahgar fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040	2021-09-09 08:30:52 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Howard Huang	459653a0f6	Revert D30745921: [DDP] Fix when buffers are reassigned in module Test Plan: revert-hammer Differential Revision: D30745921 (`d59ecc02df`) Original commit changeset: 25eb1edbf445 fbshipit-source-id: 343ead86bf1e2d0b2d4124be331ea2fa437303ad	2021-09-09 08:23:16 -07:00
Howard Huang	5bc53ac5ef	Revert D30745961: [DDP] Remove self.modules_params Test Plan: revert-hammer Differential Revision: D30745961 (`8c09510294`) Original commit changeset: 32d102502570 fbshipit-source-id: 59f7cc50d369b6cc2856cf4ebd0f58b96202336d	2021-09-09 08:23:14 -07:00
Howard Huang	f1aaf8afcd	Revert D30745960: [DDP] Remove SPMD from self.modules_buffers Test Plan: revert-hammer Differential Revision: D30745960 (`1553259520`) Original commit changeset: 66a8f9847e9f fbshipit-source-id: d3f3fb813c45ac1b0ff15c6154b2e99e5dbab433	2021-09-09 08:22:12 -07:00
Elias Ellison	3bf93d769c	[JIT] Add gradient check in constants (#64613 ) Summary: fixes internal issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613 Reviewed By: Gamrix Differential Revision: D30799016 Pulled By: eellison fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58	2021-09-09 08:13:57 -07:00
Edward Yang	d4b1016850	Filter out _disabled_torch_function_impl from handle_torch_function (#64689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64689 This brings it in line with the C++ implementation. Fixes https://github.com/pytorch/pytorch/issues/64687 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30816215 Pulled By: ezyang fbshipit-source-id: ed36af6c35467ae678d9548197efd97c36d38dec	2021-09-09 07:29:09 -07:00
Ilqar Ramazanli	239366c9c2	To add Rectified Adam Description to Documentation (#63772 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rectified Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1908.03265 <img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772 Reviewed By: datumbox Differential Revision: D30839694 Pulled By: iramazanli fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610	2021-09-09 07:10:36 -07:00
Ilqar Ramazanli	5b21f172a4	[doc][hackathon] To add AdamW Optimizer to the documentation (#63252 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdamW Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1711.05101 <img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252 Reviewed By: datumbox Differential Revision: D30839685 Pulled By: iramazanli fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0	2021-09-09 07:05:31 -07:00
Ilqar Ramazanli	39ce801d1f	To add Adamax algorithm to documentation (#63903 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adamax Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903 Reviewed By: albanD Differential Revision: D30819055 Pulled By: iramazanli fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd	2021-09-09 06:42:33 -07:00
CodemodService FBSourceClangFormatLinterBot	15c21fa45f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30835585 fbshipit-source-id: a7d35319fd3ae3eddd29b69d299d842f68d587f6	2021-09-09 04:23:50 -07:00
Yinghai Lu	233e3e5bb4	Fix lop1p lowering bug (#64724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64724 `1` will introduce a int tensor instead of float tensor, which doesn't work well with downstream operators (elementwise). Error would be like ``` [TensorRT] WARNING: IElementWiseLayer with inputs (Unnamed Layer* 1) [Unary]_output and (Unnamed Layer* 2) [Constant]_output: first input has type Float but second input has type Int32. ``` Changing the constant to be float type fixes this. Reviewed By: 842974287 Differential Revision: D30796959 fbshipit-source-id: 0538e4dd960df9ce87a2d4cafe8f1a0c061b6bad	2021-09-09 00:59:44 -07:00
Peter Bell	d0b207e68b	Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64713 Resubmit of #64442 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30825646 Pulled By: ngimel fbshipit-source-id: 66b06bd0b30b401833e337920681d19d96b11f9d	2021-09-08 22:09:01 -07:00
Rohan Varma	1553259520	[DDP] Remove SPMD from self.modules_buffers (#64474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64474 No need for a nested list here. ghstack-source-id: 137526312 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745960 fbshipit-source-id: 66a8f9847e9fe1e02c51b79647e93bf7665cf4d9	2021-09-08 19:16:15 -07:00
Rohan Varma	8c09510294	[DDP] Remove self.modules_params (#64473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64473 Unused after SPMD deprecated. ghstack-source-id: 137526305 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745961 fbshipit-source-id: 32d102502570291e01579e5b47a6d74dc71013bb	2021-09-08 19:16:13 -07:00
Rohan Varma	d59ecc02df	[DDP] Fix when buffers are reassigned in module (#64472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64472 Sometimes, user module can reassign tensor buffer, as in: ``` self.buffer = torch.randn(1, 2) # in init self.buffer += 1 # in forward ``` in this case, `self.modules_buffers` will become outdated and we should repopulate self.modules_buffers if we need to sync module buffers. See https://github.com/pytorch/pytorch/issues/63916 for full description of the issue. ghstack-source-id: 137526309 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745921 fbshipit-source-id: 25eb1edbf445703a481802e07f3058d38ea6fc64	2021-09-08 19:14:55 -07:00
Scott Wolchok	b6544ef815	[PyTorch] Fix MobileDebugInfo vector copy (#64030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64030 ghstack-source-id: 137566816 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/320277034999340 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/724509739115867 can see the vector copy disappear in the flame graph. Overall mean decreased from 354 ms to 348 ms (though I'm not sure if this is outside usual noise). Reviewed By: raziel Differential Revision: D30559032 fbshipit-source-id: 6d8bb5396d3449cc63023ee7acf694b5d146ddc1	2021-09-08 18:32:50 -07:00
Scott Wolchok	0d0d2f2ac5	[PyTorch] move from input ivalues in ByteCodeDeserializer (#64029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029 This should save us a separate pass over the data structure to destroy it. ghstack-source-id: 137566821 Test Plan: Pixel3 before: https://www.internalfb.com/intern/aibench/details/503337445067962 after: https://our.intern.facebook.com/intern/aibench/details/320277034999340 overall mean time decreased from 373 ms to 358 ms. In flame graph, we can see that some time spent destroying a vector of IValues was moved into parseMethods, and the new parseMethods time is less than the old time plus the recursive destruction time. Reviewed By: dhruvbird Differential Revision: D30559530 fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8	2021-09-08 18:32:48 -07:00
Scott Wolchok	f5e76b4e38	[PyTorch] Copy vectors less in Function::append_operator (#63977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977 Doesn't seem to be any reason to copy these argument vectors. ghstack-source-id: 137566815 Test Plan: CI Reviewed By: dhruvbird, raziel Differential Revision: D30550301 fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2	2021-09-08 18:31:38 -07:00
Yinghai Lu	0ef32625a8	[FX] make visualizer produce different formatted output (#64699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64699 Previously we just hardcode to svg format. We should give folks a choice in terms of what format they want to see. If we give a weird extension like .abc and this will error out and we expect this to be the right behavior. Reviewed By: houseroad Differential Revision: D30718883 fbshipit-source-id: fe8827262f94ea6887999bb225de763d1909eef8	2021-09-08 18:22:12 -07:00
Nikita Shulga	86e3b2727e	Re-enable nightly doc pushes (#64708 ) Summary: That were accidentally disabled by https://github.com/pytorch/pytorch/pull/64222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64708 Reviewed By: seemethere Differential Revision: D30822089 Pulled By: malfet fbshipit-source-id: 056b5c006f236c78ffe8afa4a5eab2f35e1bce89	2021-09-08 18:07:54 -07:00
Jordan Fix	9a6c2a75b8	[acc_tracer] Enable check_mutable_operations (#64456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64456 att Test Plan: CI Reviewed By: protonu Differential Revision: D30679174 fbshipit-source-id: 73f3a07d58380cd44fb3481aa97d463c0a964de8	2021-09-08 16:11:15 -07:00
Hui Guo	5c27a580ec	[tensorexpr] Allocate intermediate buffers at compile time (#64227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652220 Pulled By: huiguoo fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e	2021-09-08 15:34:44 -07:00
Hui Guo	527348a6fe	[tensorexpr] Add 'is_allocated' flag for buffers and use it to insert 'Alloc/Free' stmts (#64226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64226 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652221 Pulled By: huiguoo fbshipit-source-id: ef9bb0e3db2c444b476e5fc23956bc34ae0f0111	2021-09-08 15:34:42 -07:00
Jordan Fix	f90153cda3	[acc_normalizer] Improve error when kwarg normalization fails (#64408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64408 att Test Plan: NFC Reviewed By: protonu Differential Revision: D30716392 fbshipit-source-id: e1c3bb1afcd5363a9d502549d8a46b90226be40c	2021-09-08 15:33:32 -07:00
Hector Yuen	4533e76e7c	Update breakpad to an existing commit: 7d188f6 (#64666 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/64561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64666 Reviewed By: driazati Differential Revision: D30814127 Pulled By: hyuen fbshipit-source-id: 511a30fc26153569b1cd39f34e4a1a6bb99cc5e4	2021-09-08 15:29:10 -07:00
Ilqar Ramazanli	149f1114fe	To add Stochastic Gradient Descent to Documentation (#63805 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Stochastic Gradient Descent to the documentation. <img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805 Reviewed By: albanD Differential Revision: D30818947 Pulled By: iramazanli fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3	2021-09-08 15:22:30 -07:00
Eli Uriegas	ff18195df9	.github: Upgrade windows CUDA 10.1 -> 10.2 (#64658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64658 We don't release 10.1 anymore so let's bump to 10.2 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D30811178 Pulled By: seemethere fbshipit-source-id: c504ebf7f0d4c0d6229319d774f808b4ba0facd9	2021-09-08 14:43:33 -07:00
Shirong Wu	cc0565326c	Add plugin for linalg norm operation (#64611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64611 Add plugin for torch.linalg.norm, this plugin correctly only support norm operation without batch_size change, so vector input or matrix input with dim including '0' is not supported with this plugin. Test Plan: Unit test Reviewed By: 842974287 Differential Revision: D30525958 fbshipit-source-id: 0d66b60a390bb6235166e5a80390090d0acf691a	2021-09-08 14:33:20 -07:00
Natalia Gimelshein	a97015f22c	Revert D30735341: Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce Test Plan: revert-hammer Differential Revision: D30735341 (`a5ad08ec70`) Original commit changeset: 3cb58bed8f1f fbshipit-source-id: 874dd0f93b24a99694db42a15714834069d402bc	2021-09-08 14:27:40 -07:00
Yinghai Lu	b12150608e	[fx] make const fold code more pythonic (#64451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64451 No functional change. Test Plan: ``` buck test caffe2/test:fx_const_fold ``` Reviewed By: jfix71, RoshanPAN, houseroad Differential Revision: D30718255 fbshipit-source-id: 95f98561c7f33fcc6c839db68683c85eb152c949	2021-09-08 13:55:10 -07:00
Zafar Takhirov	24e1315d4b	[quant] Enable jit tracing on quantizable LSTM (resubmission) (#64638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64638 The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing. The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is False, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties). Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` Reviewed By: HDCharles Differential Revision: D30803753 fbshipit-source-id: a639955a96cee22538d9436f1c952a5d121f50f9	2021-09-08 13:34:18 -07:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
David Riazati	92318a9116	Make doc previews use its own S3 bucket (#64594 ) Summary: We had been using the gha-artifacts bucket (which previously only stored workflow artifacts) to keep the docs around. This makes it hard to see how our storage for artifacts vs docs is trending. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64594 Reviewed By: seemethere Differential Revision: D30794328 Pulled By: driazati fbshipit-source-id: 6b2721a3d76e8a273bde055783d56551f8409edd	2021-09-08 11:36:50 -07:00
Thomas J. Fan	43c0f033fc	TST Adds inplace checks to module_info (#63739 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds inplace checks to `test_modules`. This version checks the constructor for `inplace` and performs the check automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63739 Reviewed By: saketh-are Differential Revision: D30737774 Pulled By: jbschlosser fbshipit-source-id: 8813534511e9296c8424d1ca878412726ddd4043	2021-09-08 11:08:12 -07:00
Peter Bell	a5ad08ec70	Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64442 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30735341 Pulled By: ngimel fbshipit-source-id: 3cb58bed8f1f5aa32fd49fd37b10c8490bcc645a	2021-09-08 11:02:12 -07:00
Eli Uriegas	deb9775c07	.github: Run docker containers in detach mode (#64459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64459 Should allow users to exec into the docker container if using with-ssh, even if the build / test command has finished executing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30742797 Pulled By: seemethere fbshipit-source-id: 969ed8799216c6051439c7d41ab709b2d40938ac	2021-09-08 11:01:08 -07:00
Animesh Jain	18d24bb537	[NNC] Add Softplus operator (#64589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589 Adding softplus operator lowering for NNC. Enabling element wise fusion as well. Test Plan: Added a test in test_jit_fuser.py Reviewed By: bertmaher Differential Revision: D30736449 fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93	2021-09-08 10:49:58 -07:00
Horace He	35413a16f7	Add `__matmul__` to the magic methods for FX tracing (#64512 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64512 Reviewed By: mrshenli Differential Revision: D30797265 Pulled By: Chillee fbshipit-source-id: 7630e048a960e0b27c4309d04d85301abe325189	2021-09-08 10:03:48 -07:00
kshitij12345	195cb4efa8	update scatter formula (#64546 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63430 Already tested OpInfo gradient tests `544c8e6a5d/torch/testing/_internal/common_methods_invocations.py (L8575-L8577)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64546 Reviewed By: saketh-are Differential Revision: D30768759 Pulled By: albanD fbshipit-source-id: 27d144971c51a956a232fc7d02df5c9d2706d565	2021-09-08 10:02:35 -07:00
Kevin Tse	1409492fdb	fixing trapezoid() comments for clarity (#64592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64592 cc mruberry rgommers heitorschueroff Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30785663 Pulled By: NivekT fbshipit-source-id: e968687fbb83a59bb46ce6858c6caafa5aa04412	2021-09-08 09:45:46 -07:00
Ivan Yashchuk	dd8f6ac597	Add forward mode differentiation for torch.linalg.cholesky and transpose (#62159 ) Summary: This PR adds forward mode differentiation for `torch.linalg.cholesky`, `torch.linalg.cholesky_ex`, and `transpose` functions. Complex tests for Cholesky fail because for some reason the gradcheck sends matrices full of zeros to `cholesky_jvp` function. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62159 Reviewed By: mrshenli Differential Revision: D30776829 Pulled By: albanD fbshipit-source-id: 32e5539ed6423eed8c18cce16271330ab0ea8d5e	2021-09-08 09:44:30 -07:00
Hojin Lee	a2934b38f8	Fix typo embedding_renorm_cuda_ (#64542 ) Summary: Fixes #{issue number} cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64542 Reviewed By: mrshenli Differential Revision: D30792842 Pulled By: ngimel fbshipit-source-id: c9a548256d02b3ce6fb77dd9fb058084f2c91608	2021-09-08 09:36:24 -07:00
Rohan Varma	e0e832c2ba	[c10d] Provide failure reason from ProcessGroup when aborting NCCL comm (#64241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64241 When things go wrong PG NCCL aborts nccl communicators via `ncclCommAbort`, but one issues is that often the error can be set to `ncclSystemError` (see https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L176) when that might not be the true cause of the issue and the actual issue is that some prior work timed out, communicator was aborted on other rank, etc. This results in a lot of confusion when debugging jobs with a large no. of processes as the current message for ncclSystemError is not very informative: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L22 The fix here is to pass in a string exception message from PG NCCL down to `NCCLUtils` which will aim to raise that as the actual issue and not the confusing `ncclSystemError` message. Test Plan: CI Reviewed By: pallab-zz, cbalioglu Differential Revision: D30658855 fbshipit-source-id: 17661dbe0a1bb8cc5b87b637c47634b1f52f54e1	2021-09-08 09:19:24 -07:00
Sameer Deshmukh	7205ca0210	Change MaxUnpool to accept tensors with 0-dim batch sizes. (#64082 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/38115. Changes the `MaxUnpool` module to work with 0-dimensions batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64082 Reviewed By: mrshenli Differential Revision: D30793907 Pulled By: jbschlosser fbshipit-source-id: d21aa665be5aa18f592b39ef7b4e3cbc632e21ed	2021-09-08 08:41:09 -07:00
johnlu	ba8c1fc648	Add Half conversion of bit cast for SYCL kernel (#64340 ) Summary: ## Motivation Enhance the performance of Half/float conversion in SYCL kernels. ## Solution Add the native SYCL half type to help convert the half from/to float in the kernel code. ## Additional Context `__SYCL_DEVICE_ONLY__` is a MACRO only valid when compiling the kernel code for SYCL backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64340 Reviewed By: gchanan Differential Revision: D30720823 Pulled By: ezyang fbshipit-source-id: e7e770d02df5b2d45da61d2fed3ba59383b3dc3a	2021-09-08 08:25:47 -07:00
Bert Maher	7f0feafa55	[nnc] Provide helpful error messages about turning off the fuser (#64516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64516 If fuser compilation fails due to a bug (which should be highly unlikely at this point) we want to direct the user how to unblock themselves by disabling fusion, in addition to requesting that they report a bug. ghstack-source-id: 137398537 Test Plan: existing tests Reviewed By: ZolotukhinM Differential Revision: D30758051 fbshipit-source-id: 98be89f1b1d4fb3bc816f5b2634c618b9297930e	2021-09-08 08:10:22 -07:00
leslie-fang-intel	768014b3e6	Allow disabling cache in autocast (automatic mixed precision) (#63552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63552 In this PR, we want to exclude these 2 cases in the `Autocast` weight cache usages: - Using `torch.jit.trace` under the `Autocast` As report in https://github.com/pytorch/pytorch/issues/50231 and several other discussions, using `torch.jit.trace` under the `Autocast`, the trace process would hit Autocast's weight cache and fails. So we should disable weight cache under the trace process. - Using `Autocast` with `Grad mode` - Usually we are using `Grad mode` for training. Since in the training phase, the weight will change in every step. So we doesn't need to cache the weight. - For the recommended `Autocast` training case in the [doc](https://pytorch.org/docs/stable/amp.html), `Autocast` will clear the cache every step leaving the context. We should disable it to save the clear operations. ``` model = Net().cuda() optimizer = optim.SGD(model.parameters(), ...) for input, target in data: optimizer.zero_grad() with autocast(): output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30644913 Pulled By: ezyang fbshipit-source-id: ad7bc87372e554e7aa1aa0795e9676871b3974e7	2021-09-08 07:47:18 -07:00
Protonu Basu	b616132403	Adding support for lowering 4Bit EmbeddingBag Operator (#5806 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64001 Add 4 bit embeddingbag operator in acc_ops. Test Plan: Let CI run. Reviewed By: jfix71 Differential Revision: D30532824 fbshipit-source-id: bf476c9710477792aae202dacf64e23539c33bd9	2021-09-08 07:13:16 -07:00
Freey0	2223737da9	restore test_inplace_comparison_ops_require_inputs_have_same_dtype Expected behavior (#64267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64267 This test expects every operation to throw a runtime error. And Reinsert in-place operation test，Fix bug for comparison operation fix: #64018 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30720915 Pulled By: ezyang fbshipit-source-id: 215a6556d20770f70f4ced1c1f9a9753933f1d37	2021-09-08 06:42:12 -07:00
Zafar Takhirov	9cc44aad21	[quant] AO migration of the `quantize.py` (resubmission) (#64445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantize.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: HDCharles Differential Revision: D30734870 fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b	2021-09-08 04:58:47 -07:00
Mikhail Zolotukhin	72274e2a2f	[TensorExpr] Don't rely on exceptions in Vectorizer. (#64609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609 We've been using exceptions to indicate whether vectorization succeeded or not, but that posed some problems with (e.g. we spent too much time symbolicazing these exceptions). This change converts this mechanism to a standard error return code. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30795342 Pulled By: ZolotukhinM fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17	2021-09-08 00:25:34 -07:00
Jordan Fix	2341ec9ef1	[fx_const_fold] Fix constant folding for attrs in submodule hierarchies (#64342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64342 Previously we weren't handling the case where an attribute was in a module that wasn't the root. Test Plan: Added unit test coverage. Reviewed By: yinghai Differential Revision: D30691730 fbshipit-source-id: b39b5cf748c4c882f315a4f32b51ad88cc7a43ed	2021-09-07 22:44:39 -07:00
Hendrik Schröter	5721205417	Add __ge__ to TorchVersion (#64565 ) Summary: This PR adds greater equal comparison so that not the base class's (str) comparison method is used. This is necessary for a correct comparison with a version string. Previously the following was the case: ```py >>> torch.__version__ '1.10.0.dev20210830+cpu' >>> torch.__version__>"1.9" True >>> torch.__version__>="1.9" False # Wrong output since the base class (str) was used for __ge__ comparison ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64565 Reviewed By: raghuramank100 Differential Revision: D30790463 Pulled By: mrshenli fbshipit-source-id: 79c680f8b448001b34d3e5d5332124a78bea4e34	2021-09-07 20:16:09 -07:00
Maksim Levental	81fe2c5e49	add out variant of linear (#61801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801 resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: desertfire Differential Revision: D29812510 Pulled By: makslevental fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6	2021-09-07 19:58:52 -07:00
Steven Jin	71ba76b1b5	Fix building docs instructions (#64508 ) Summary: Fixes #{64507} Removed duplicate instruction and linted the file a bit (consistent spacing around codeblocks/headers, adding code types in codeblocks, remove `$` from bash code blocks when uncecessary). Pull Request resolved: https://github.com/pytorch/pytorch/pull/64508 Reviewed By: raghuramank100 Differential Revision: D30791164 Pulled By: mrshenli fbshipit-source-id: a00db32dcfdd1ecc194c836f31174c806062eb6d	2021-09-07 19:01:52 -07:00
Nikita Shulga	4e98304eb9	Fix quicklint (#64612 ) Summary: Fixes land-race introduced by `a22c936b63` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64612 Reviewed By: ngimel Differential Revision: D30798648 Pulled By: malfet fbshipit-source-id: ca546f68141d44493deba7bbf840e5f9662e8558	2021-09-07 18:52:22 -07:00
Natalia Gimelshein	e777e1b01c	Revert D29998114: [pytorch][PR] enable bf16 mkldnn path for gemm Test Plan: revert-hammer Differential Revision: D29998114 (`acc9f9afc8`) Original commit changeset: 459dc5874c63 fbshipit-source-id: 1994623a3afc22a94bd0cf5de766b023185f5238	2021-09-07 18:45:13 -07:00
Don Jang	1a033b45dd	[JIT] Fix a bug of rejecting ops with AliasAnalysisKind::CONSERVATIVE (#64336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64336 Currently AliasDB rejects any user-defined ops with `AliasAnalysisKind::CONSERVATIVE` if they do not have a special treatment for alias analysis. For example, the following alias schema gets rejects: ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` This rejection condition is contradictory: AliasDB can handle ops with `CONSERVATIVE` in a general way without any special casing at https://fburl.com/diffusion/op5u72sk calling https://fburl.com/diffusion/h3aws5dd which seems very appropriate to be conservative for alias analysis. This change corrects the rejection condition to be satisfied for ops with special casing but have `CONSERVATIVE`, since they both cannot be used simultaneously. Test Plan: Confirmed that ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` gets accepted and `my_op`'s all inputs and outputs are put to point to wildcard(*) by AliasDB. Reviewed By: eellison Differential Revision: D30690121 fbshipit-source-id: 431cc1a84edd5227f52b44a0fd85d5eb16f3c288	2021-09-07 18:26:31 -07:00
Elias Ellison	8e1fdd4cd3	Add symbolic shape comparison optimization (#64300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64300 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738146 Pulled By: eellison fbshipit-source-id: 96287798535b367f23d3e9430d70fc02c59744ab	2021-09-07 18:22:32 -07:00
Elias Ellison	474a51b6bf	Refactor to use shape arguments (#64299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64299 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738141 Pulled By: eellison fbshipit-source-id: 37ca30de81349ecf23d8656291863737b6ad6d96	2021-09-07 18:22:30 -07:00
Elias Ellison	bccbe310ef	Add view with negative dim (#63516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63516 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738143 Pulled By: eellison fbshipit-source-id: c7cd01cb2c8a13cb2664415f3d98aedec19a8e07	2021-09-07 18:22:28 -07:00
Elias Ellison	5a1f8b8573	Generalize expand logic (#63615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63615 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738148 Pulled By: eellison fbshipit-source-id: 4ef74a9c9b39c0beb73949e63aa844c46ab637eb	2021-09-07 18:22:26 -07:00
Elias Ellison	5eb8cec663	Add permute, arange (#63407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63407 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738149 Pulled By: eellison fbshipit-source-id: 36d572488408d38b0643aa93cb08aab5c45218ad	2021-09-07 18:22:24 -07:00
Elias Ellison	cf2d15bf84	Add support for slice, selec twith int, index_select (#63365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63365 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738144 Pulled By: eellison fbshipit-source-id: 7e0c572209bdc6e62ecb4fd1f06f80291de69803	2021-09-07 18:22:22 -07:00
Elias Ellison	c8a608b197	Add squeeze, unsqueeze, transpose shape functins (#63099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63099 These are checked by OpInfos, which represent all of the inputs and semantics of the operators so it should be an easy stamp Test Plan: Imported from OSS Reviewed By: desertfire, astaff Differential Revision: D30347514 Pulled By: eellison fbshipit-source-id: 37b4c9ecd8c222cc12bf39166181464b43218830	2021-09-07 18:22:19 -07:00
Elias Ellison	a39f3c68b7	Add batch of unary functions (#63050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63050 Test Plan: Imported from OSS Reviewed By: priyaramani, astaff Differential Revision: D30347513 Pulled By: eellison fbshipit-source-id: abaf641778671d17df87a2b7b47bad7501a91b5a	2021-09-07 18:21:04 -07:00
Yanli Zhao	c1b701bc3e	Back out "update rpc tensorpipe logic for sparse tensors" (#64575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64575 Original commit changeset: daee9a567645 Test Plan: unit test Reviewed By: gcramer23 Differential Revision: D30778736 fbshipit-source-id: 8d9386158fb6a3d025c149cdc37558d57c615e9f	2021-09-07 18:00:39 -07:00
lezcano	566ee1217f	Use trsm for triangular_solve in CPU (#63567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63567 The current implementation called trtrs for CPU and trsm for CUDA. See https://github.com/pytorch/pytorch/issues/56326#issuecomment-825496115 for a discussion on the differences between these two functions and why we prefer trsm vs trtrs on CUDA. This PR also exposes the `side` argument of this function which is used in the second PR of this stack to optimise the number copies one needs to make when preparing the arguments to be sent to the backends. It also changes the use of `bool`s to a common enum type to represent whether a matrix is transposed / conj transposed, etc. This makes the API consistent, as before, the behaviour of these functions with `transpose=True` and `conjugate_transpose=True` it was not well defined. Functions to transform this type into the specific types / chars for the different libraries are provided under the names `to_blas`, `to_lapack`, `to_magma`, etc. This is the first of a stack of PRs that aim to improve the performance of `linalg.solve_triangular`. `trsm` has an extra parameter (`side`), which allows to ellide the copy of the triangular matrix in many cases. Fixes https://github.com/pytorch/pytorch/issues/56326 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30566479 Pulled By: mruberry fbshipit-source-id: 3831af9b51e09fbfe272c17c88c21ecf45413212	2021-09-07 17:26:17 -07:00
Tao Xu	52ff9bc639	[iOS][Metal] Add aten:hardswish (#64588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64588 Add `aten::hardswish` to run the mobilenetv3 model from torchvision. ghstack-source-id: 137479323 Test Plan: - buck test pp-macos - circleCI Reviewed By: beback4u Differential Revision: D30781008 fbshipit-source-id: 83454869195ef4ab50570ea9b3bf2a55f32a3e86	2021-09-07 15:41:29 -07:00
kshitij12345	2c351c76e0	[special] Alias igamma, igammac to special.gammaninc, special.gammaincc (#61902 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Also added relevant OpInfo TODO: * [x] Check rendered docs gammainc : https://docs-preview.pytorch.org/61902/special.html#torch.special.gammainc * [x] Check rendered docs gammaincc: https://docs-preview.pytorch.org/61902/special.html#torch.special.gammaincc Pull Request resolved: https://github.com/pytorch/pytorch/pull/61902 Reviewed By: ngimel Differential Revision: D30761428 Pulled By: mruberry fbshipit-source-id: 06a16432873357958d53364f12a4e91c29779d26	2021-09-07 15:31:26 -07:00
Mike Ruberry	b01d2d1d3e	Disables four failing distributions tests on windows (#64596 ) Summary: Per title. Unblocks CI. See https://github.com/pytorch/pytorch/issues/64595. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64596 Reviewed By: mrshenli Differential Revision: D30787296 Pulled By: mruberry fbshipit-source-id: 84b90cb25c0185f1851db02425ea40aa13d3e598	2021-09-07 15:29:13 -07:00
driazati	a22c936b63	Add lint to ensure .github/ pypi dependencies are pinned (#64463 ) Summary: Example failing run: https://github.com/pytorch/pytorch/pull/64463/checks?check_run_id=3501249102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64463 Reviewed By: janeyx99 Differential Revision: D30744930 Pulled By: driazati fbshipit-source-id: 4dd97054db1d4c776a4512bc3d664987cd7b6d23	2021-09-07 15:28:11 -07:00
David Riazati	7e88d0b370	Update explicit_ci_jobs to work with GHA (#64598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64598 This adds a filter option rather than an all-or-nothing so it's easier to iterate on a specific job. ```bash python tools/testing/explicit_ci_jobs.py --filter-gha 'generated-linux-gcc5.4*' ``` See #64600 for an example usage NB: If you regenerate the worfklows you will need to re-run that command to re-delete everything. Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30788850 Pulled By: driazati fbshipit-source-id: a32c266bbd876c396665bceef9a0a961b4586564	2021-09-07 15:21:12 -07:00
Nikita Shulga	a48d83a575	Move ParallelTBB to GHA (take 2) (#64193 ) Summary: 2nd attempt to do the same Skip failing `TestTensorCreationCPU.test_trilu_indices_cpu` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64193 Reviewed By: mrshenli Differential Revision: D30779469 Pulled By: malfet fbshipit-source-id: 5c51fcbb383d0823d0e953d7af181b5f22eda9ab	2021-09-07 15:11:00 -07:00
Mike Iovine	369db8924f	[Static Runtime] Add first iter metric (#64457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64457 The first iteration is special since it initializes the memory planner. This change logs and reports first iteration time during benchmarking. It also generates a FAI-PEP output when `generate_ai_pep_output` is set. Test Plan: Run any benchmark, and observe: ``` I0902 15:19:32.528977 2492358 impl.cpp:948] PyTorchObserver {"value":6.415958881378174,"unit":"ms","metric":"latency","type":"static_runtime_first_iter"} ... First iter time: 6.41596 ms ``` Note that this metric is likely to have significantly more noise than the others since we don't have as many data points. Unit tests: `buck test //caffe2/test:static_runtime` Reviewed By: d1jang Differential Revision: D30740619 fbshipit-source-id: 4dcfccd5629f4fa34254fd355073ef19e151245a	2021-09-07 15:00:30 -07:00
Wenliang Zhao	3bd69d3020	add bubdle input into AIBench (#64557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64557 MaskRCNN speed depends on how many people detected in the detection stage. A random input from dataloader doesn't satisfy this. In order to standardize the benchmarking, we use 2 standard image for benchmarking, 2/3 people. Test Plan: AIBench result: https://www.internalfb.com/intern/aibench/details/945883114818980 Reviewed By: axitkhurana Differential Revision: D30446049 fbshipit-source-id: a2826fdb69e9f840c0afc566c4cbbcde1c2fba89	2021-09-07 14:46:23 -07:00
Facebook Community Bot	3c87f55752	Automated submodule update: FBGEMM (#64582 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `3ce04fc664` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64582 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D30779695 fbshipit-source-id: 22460a4047e2462e672eb4931e44648ae6bde627	2021-09-07 14:16:22 -07:00
haozhe.zhu	acc9f9afc8	enable bf16 mkldnn path for gemm (#61891 ) Summary: # Goal: Integrate mkldnn bf16 Gemm to pytorch ## BF16 Suport for mm, addmm, bmm, addbmm, baddbmm, mv, addmv, dot (with mkldnn matmul primitive): https://oneapi-src.github.io/oneDNN/group__dnnl__api__matmul.html For gemm related ops, we keep all inputs under plain format. So we will not introduce opaque tensor for these ops to save mem copy here. ![mkldnn bf16 gemm integration](https://user-images.githubusercontent.com/54701539/126263077-4b5134e1-52a7-4fad-94fb-19e13a0377f6.png) The minimized integration is only dispatch to mkldnn in addmm, but for gemm with 3-D input (with additional dim for"batch") this will call mkldnn gemm for "batch" times. Since mkldnn matmul support input with multiple dims, we directly dispatch to mkldnn gemm in {bmm, addbmm, baddbmm} to reduce the time to create mkldnn memory desc, primitive, etc. For the different definition for "bias" between mkldnn(which must be shape of (1, N)) and pytorch (which can be same shape with gemm result (M, N)), we use a fused sum to handle it. ## User Case: User case is exactly same with before because no opaque tensor's is introduced. Since the pytorch has already support bf16 data type with CPU tensor before, we can leverage the existed bf16 gemm UT. ## Gemm performance gain on CPX 28Cores/Socket: Note: data is collected using PyTorch operator benchmarks: https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark (with adding bfloat16 dtype) ### use 1 thread on 1 core ### torch.addmm (M, N) * (N, K) + (M, K) \| impl \|16x16x16\|32x32x32\| 64x64x64 \| 128x128x128\| 256x256x256\| 512x512x512\|1024x1024x1024\| \|:---:\|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 4.115us\|4.583us\|8.230us\|26.972us\|211.857us\|1.458ms\|11.258ms\| \| aten-bf16 \| 15.812us\| 105.087us\|801.787us\|3.767ms\|20.274ms\|122.440ms\|836.453ms\| \| mkldnn-bf16 \|20.561us \|22.510us\|24.551us\|37.709us\|143.571us\|0.835ms\|5.76ms\| We can see mkldnn-bf16 are better than aten bf16, but for smaller shapes, mkldnn bf16 are not better than aten fp32. This is because onednn overhead, this overhead more like a "constant" overhead and while problems get larger, we can ignore it. Also we are continue optimize the kernel efficiency and decrease the overhead as well. More shapes \| impl \|1x2048x2048\|2048x1x2048\| 2048x2048x1 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.640ms\|3.794ms\|0.641ms\| \| aten-bf16 \| 2.924ms\| 3.868ms\|23.413ms\| \| mkldnn-bf16 \|0.335ms \|4.490ms\|0.368ms\| ### use 1 socket (28 thread, 28 core) \| impl \| 256x256x256\| 512x512x512\|1024x1024x1024\| 2048x2048x2048\|4096x4096x4096\| \|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 35.943us \|140.315us\|643.510us\|5.827ms\|41.761ms\| \| mkldnn-bf16 \|53.432us\|114.716us\|421.858us\|2.863ms\|23.029ms\| More shapes \| impl \|128x2048x2048\|2048x128x2048\| 2048x2048x128 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.561ms\|0.458ms\|0.406ms\| \| mkldnn-bf16 \|0.369ms \|0.331ms\|0.239ms\| We dose not show aten-bf16 for this case since aten-bf16 always compute as single thread and the performance is extreme poor. The trend for this case is similar for 1 thread on 1 core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61891 Reviewed By: iramazanli Differential Revision: D29998114 Pulled By: VitalyFedyunin fbshipit-source-id: 459dc5874c638d62f290c96684ca0a694ded4b5a	2021-09-07 13:00:37 -07:00
Anirudh Dagar	337c71be05	Array API: Add `torch.linalg.matmul` alias to `torch.matmul` (#63227 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62811 Add `torch.linalg.matmul` alias to `torch.matmul`. Note that the `linalg.matmul` doesn't have a `method` variant. Also cleaning up `torch/_torch_docs.py` when formatting is not needed. cc IvanYashchuk Lezcano mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63227 Reviewed By: mrshenli Differential Revision: D30770235 Pulled By: mruberry fbshipit-source-id: bfba77dfcbb61fcd44f22ba41bd8d84c21132403	2021-09-07 12:35:32 -07:00
Jane Xu	8407ce7e38	[small BE] .github: refactor concurrency into a common macro (#64587 ) Summary: By using a macro for these concurrency groups, we can edit just one place for the linux and windows workflows (vs 2). I wanted to loop all the other workflow files in as well, but since those aren't generated, the macros won't work the same way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64587 Reviewed By: mrshenli Differential Revision: D30783224 Pulled By: janeyx99 fbshipit-source-id: ae16ebb12d2d63a563d28f0ce88e280f68ed4b9b	2021-09-07 12:31:55 -07:00
Kevin Tse	7e4ebe06ca	Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054 Fixes #63608 cc mruberry rgommers heitorschueroff Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30617078 Pulled By: NivekT fbshipit-source-id: 815896ec56d447562790df4d662e94fd13457e2a	2021-09-07 11:41:55 -07:00
Danielle Pintz	c9d6ca4c54	Add space in Feature Request issue template (#64563 ) Summary: Add space between emoji and text in Feature Request issue template Pull Request resolved: https://github.com/pytorch/pytorch/pull/64563 Reviewed By: janeyx99 Differential Revision: D30779429 Pulled By: seemethere fbshipit-source-id: 3625299923a7022fa66473633524a6620d58188b	2021-09-07 11:36:06 -07:00
Lu Fang	85eeb4d682	Clean up op BC check list (#64584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64584 It has been a while since last clean up. The list is really long. Test Plan: ci Reviewed By: hl475 Differential Revision: D30779350 fbshipit-source-id: 908b47d0b9a16b784aad6a34c5c87f923500c247	2021-09-07 11:25:40 -07:00
Ilqar Ramazanli	43248d9112	[doc][hackathon] To add Adam Optimizer to the documentation (#63251 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251 Reviewed By: albanD Differential Revision: D30779163 Pulled By: iramazanli fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86	2021-09-07 11:03:35 -07:00
Yanli Zhao	adb85b32d3	minor fix for elastic doc (#64531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64531 fix #64530 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D30760879 fbshipit-source-id: 94ed1476e886513427d928a36f5be6b9bfff0826	2021-09-07 09:31:01 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Ilqar Ramazanli	f767cf6683	To change WarmUp Scheduler with ConstantLR and LinearLR (#64395 ) Summary: Partially unblocks https://github.com/pytorch/vision/issues/4281 Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function. In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular ```Python scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear") ``` will look like ```Python scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5) scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5) ``` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395 Reviewed By: datumbox Differential Revision: D30753688 Pulled By: iramazanli fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324	2021-09-07 08:42:31 -07:00
Mike Iovine	75b9e4a128	[JIT] Freeze unrolls constant loops (#63614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63614 There are a number of optimizations (`RemoveListMutation` in particular) that are tied to loop unrolling in `runOptimizations`. However, these were not invoked from `freeze_module` since the freezing pass should be idempotent. This diff makes `runOptimizations` run `UnrollConstantLoops` instead of `UnrollLoops`. `freeze_module` is then able to run these optimizations. Test Plan: Observed that `freeze_module` applies `RemoveListMutation` Reviewed By: eellison Differential Revision: D30437356 fbshipit-source-id: cba04bd958a48ad51b151aa3264f3d5bbb1fc2a4	2021-09-07 08:06:47 -07:00
Kefei Lu	adbcc819cd	Fix fx2trt SplitterBase non_tensor_input logic (#64286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64286 During graph splitting, `_SplitterBase` supports taking into consideration whether the subnet boundary nodes produces "supported" outputs that will cross the acc/non-acc boundary. Specifically, if the backend only supports Tensor-based data passing cross boundary, then we cannot split the graph at a place where the node output is a non-Tensor type (e.g., `Tuple[Tensor]`). There's currently a bug in this logic that it does not correctly detect the output type of a Node. Instead of using `Node.meta['tensor_meta']`, we should instead check `Node.meta['type']`. `Node.meta['tensor_meta']` is not appropriate because this key will exist if the node output is an iterable and one of the element is of type `Tensor`. So `Tuple[Tensor]` will be wrongly considered "supported". Test Plan: arc lint run CI tests Reviewed By: yinghai, 842974287 Differential Revision: D30617147 fbshipit-source-id: e8ba70dfaddc05cafb8037d58fca73b7ccbb1a49	2021-09-07 04:02:29 -07:00
Ivan Yashchuk	32fbeb170d	Update error messages that use LAPACK error codes (#63864 ) Summary: This PR updates the` batchCheckErrors` and `singleCheckErrors` functions so that the error messages are defined only once. `batchCheckErrors` function reuses `singleCheckErrors` now. Fixes https://github.com/pytorch/pytorch/issues/63220, fixes https://github.com/pytorch/pytorch/issues/59779 cc jianyuh nikitaved pearu mruberry heitorschueroff walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63864 Reviewed By: ngimel Differential Revision: D30672933 Pulled By: mruberry fbshipit-source-id: 0ba37ff98ef278efdb12c3890aa07d687047da7a	2021-09-07 00:05:46 -07:00
Anirudh Dagar	1a1fb31cfa	Support `torch.concat` alias, add `cat` OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack} (#62560 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61767 ## Changes - [x] Add `torch.concat` alias to `torch.cat` - [x] Add OpInfo for `cat`/`concat` - [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`) - [x] `cat`/`concat` - [x] `stack` - [x] `hstack` - [x] `dstack` - [x] `vstack`/`row_stack` - [x] Remove redundant tests for `cat`/`stack` ~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~ Edit: cat/concat OpInfo has been added. Note: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch. Thanks to krshrimali for guidance on my first PR :)) cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560 Reviewed By: saketh-are Differential Revision: D30762069 Pulled By: mruberry fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337	2021-09-06 23:57:18 -07:00
Natalia Gimelshein	0a1aaff0de	Remove dead code from THC (THCApply.cuh) (#64559 ) Summary: cc peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64559 Reviewed By: mruberry Differential Revision: D30769526 Pulled By: ngimel fbshipit-source-id: 034a5c778a2b902cffa57b76511fa0dcdea26825	2021-09-06 21:26:08 -07:00
Nikita Shulga	571a2becf3	Move ParallelNative and PureTorch to GHA (#64452 ) Summary: Separate ParallelTBB move to https://github.com/pytorch/pytorch/pull/64193 as it requires some further investiagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/64452 Reviewed By: seemethere, janeyx99 Differential Revision: D30738337 Pulled By: malfet fbshipit-source-id: 81c46423e903058bd1a3e8553e8a10ce978eeefd	2021-09-06 11:40:44 -07:00
Shen Xu	544c8e6a5d	Mark functions in backend header as inline to suppress warning (#64098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64098 Reviewed By: kimishpatel, iseeyuan Differential Revision: D30593104 fbshipit-source-id: 328196b9bc4a89a28ad89bede7e337107976c303	2021-09-05 16:45:23 -07:00
Bert Maher	bcc7e82371	Revert D30745610: [nnc] Make our exceptions c10::Errors, get C++ stacktraces Test Plan: revert-hammer Differential Revision: D30745610 (`18b2751ea1`) Original commit changeset: a1cfaa7364ef fbshipit-source-id: 9b716053b96a65745240ddef1c456c44d5d09671	2021-09-05 16:08:09 -07:00
Sangbaek Park	49fe829cae	[Vulkan] Code Quality: Remove duplicate code for hardshrink and leaky_relu functions (#64405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64405 Code quality improvement: removed duplicate code for hardshrink and leaky_relu functions. ghstack-source-id: 137319378 Test Plan: ```buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test"``` Reviewed By: SS-JIA Differential Revision: D30690251 fbshipit-source-id: 5729d1f32946e42f41df77756a8313f297dd822f	2021-09-05 12:53:58 -07:00
Mike Ruberry	1901c675e1	Back out "nn.functional.linear OpInfo" (#64517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64517 Original commit changeset: ca41dbd98176 Test Plan: PyTorch CI Reviewed By: ngimel Differential Revision: D30758201 fbshipit-source-id: 2d3274293d340373b8af86083336607818019619	2021-09-05 02:25:00 -07:00
Chris Cai	008bf6689b	Back out "D30740897 Add fusion enabled apis" (#64500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64500 D30740897 (`39aeb3bf63`) broke caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage (https://fburl.com/test/mb46jxon) and blocked training_platform_unit_tests {F660271297} multsect results confirms ``` multisect --config FBCODE_TEST bisect 844424966128796 --workers 16 revisions --begin 09629edc --end fc86b434 D30740897 (`39aeb3bf63`) ```` {F660271232} Test Plan: ``` buck test mode/opt //caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4785074671474181 ✓ Pass: caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage - main (3.729) Summary Pass: 1 ``` Differential Revision: D30753916 fbshipit-source-id: 302fd4113ef1f3069846be03edc2300d82b66719	2021-09-04 20:55:58 -07:00
Bert Maher	18b2751ea1	[nnc] Make our exceptions c10::Errors, get C++ stacktraces (#64332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64332 With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message. E.g., ``` RuntimeError: UNSUPPORTED DTYPE Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\ onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\ .so) ``` Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D30745610 Pulled By: bertmaher fbshipit-source-id: a1cfaa7364ef4120de834e9cbe57ced1d082ab4e	2021-09-04 20:31:54 -07:00
Peter Bell	6cac7ca980	Ensure num_threads is initialized in get_num_threads (#64486 ) Summary: Possible source of the recent layernorm CI failures. `lazy_init_num_threads` appears at the top of `parallel_for` and can change the number of threads set. So, we need to ensure `num_threads` is initialized during `get_num_threads` calls as well. It's already done this way for OpenMP, but is missing from other parallel backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64486 Reviewed By: mruberry Differential Revision: D30752615 Pulled By: ngimel fbshipit-source-id: 085873ce312edbee1254c0aaae30dec7fcfe2c57	2021-09-04 12:38:09 -07:00
Facebook Community Bot	604e885925	Automated submodule update: FBGEMM (#64338 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9ccb2714a9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64338 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30690319 fbshipit-source-id: 884d1f950cd1f7d2a77b79affb9215f285d5d0da	2021-09-04 00:44:28 -07:00
Ivan Yashchuk	a91a278d60	Fix `copy_transpose_valid` condition for `copy_same_type_transpose_` (#64425 ) Summary: Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-910868849)! I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`. Fixes https://github.com/pytorch/pytorch/issues/64358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425 Reviewed By: mruberry Differential Revision: D30752725 Pulled By: ngimel fbshipit-source-id: f40370ea8365c94e30f8e8a3dcab5f3b3462464a	2021-09-03 18:50:33 -07:00
Michael Carilli	e4ff14ad59	[CUDA graphs] Error if attempting to capture uncapturable nccl (#64440 ) Summary: NCCL < 2.9.6 is not capturable. Attempting to capture it can cause nasty behavior (for example, ive seen capture succeed, but replay silently hang). Pytorch should preempt this with a friendlier error. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64440 Reviewed By: mruberry Differential Revision: D30733884 Pulled By: ngimel fbshipit-source-id: 5f2df3cf5cc0e5e68f49bf22a80d9f58064dc7ec	2021-09-03 13:23:07 -07:00
Nikita Shulga	0e3b45eaef	Fix logical typo in _compare_trilu_indices (#64468 ) Summary: I'm pretty sure that repeating the same call twice is pretty meaningless and intend was to call `tril`/`tril_indices` in first case and `triu`/`triu_indices` in another Pull Request resolved: https://github.com/pytorch/pytorch/pull/64468 Reviewed By: mruberry Differential Revision: D30744978 Pulled By: malfet fbshipit-source-id: 7cd36789a7ebf1cc263fb2d875e479c05e7588a4	2021-09-03 10:22:49 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Kefei Lu	91b926fab3	Add fx2trt pass for removing duplicate output args (#64461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64461 Fx2TRT does not support duplicate nodes in the output args tuple. This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets. This pass will change both the subnets and top level module. Test Plan: Run: ``` buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args ``` Reviewed By: yinghai Differential Revision: D30740499 fbshipit-source-id: 98459f7677980b21c7bffda918158001285572db	2021-09-02 23:04:12 -07:00
Elias Ellison	39aeb3bf63	Add fusion enabled apis (#64429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64429 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30740897 Pulled By: eellison fbshipit-source-id: 446aa63b5d763f1cfffea62547db7294368e3438	2021-09-02 22:19:09 -07:00
Elias Ellison	7031fbdc63	update optimize_for_inference docs (#64428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64428 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30740898 Pulled By: eellison fbshipit-source-id: b94d2c3deb661a6ba048f19e8c1d5e1799667eeb	2021-09-02 22:17:58 -07:00
James Reed	e1c3e5f830	[resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30744870 Pulled By: jamesr66a fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a	2021-09-02 21:13:21 -07:00
Mike Ruberry	cd82bc1af9	Skips layer norm OpInfo on tbb platform (#64469 ) Summary: The OpInfo tests appear to be discovering a layer norm x tbb issue that requires investigation. Skipping tests on that platform for now to restore CI signal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64469 Reviewed By: ngimel Differential Revision: D30745746 Pulled By: mruberry fbshipit-source-id: 282484cc00b867fac85b7df61430d64277da6421	2021-09-02 20:53:01 -07:00
Peter Bell	c19bd05e84	THC: Cleanup dead code (#64441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64441 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30735342 Pulled By: ngimel fbshipit-source-id: 84ab36f7aec6b8cd7f1f34c19a58a382c06ad68d	2021-09-02 17:45:16 -07:00
driazati	db692ec0b3	Regenerate generated github workflows (#64465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64465 These were out of date and causing master failures Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D30744594 Pulled By: driazati fbshipit-source-id: 09a21c3c5d9bc83b368d66cabbafd1ba83302dd3	2021-09-02 17:31:29 -07:00
David Riazati	e161872aab	Revert D30732630: [quant] Enable jit tracing on quantizable LSTM Test Plan: revert-hammer Differential Revision: D30732630 (`116142143c`) Original commit changeset: 443e351ebb0e fbshipit-source-id: 49001392f01366f3b1ccc31139f824c80b86cd40	2021-09-02 17:08:26 -07:00
Zafar Takhirov	046ed57a4d	Revert D30055886: [quant] AO migration of the `quantize.py` Test Plan: revert-hammer Differential Revision: D30055886 (`44e3ed88c9`) Original commit changeset: 8ef7470f9fa6 fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725	2021-09-02 16:59:59 -07:00
Jane Xu	4968d0b34f	[POC] .github: Add event name to concurrency (#64402 ) Summary: This would ensure that manually/API triggered workflows would not cancel other triggered workflows. For example, the manually triggered periodic 11.1 linux job cancelled the scheduled one here, which we may not want: ![image](https://user-images.githubusercontent.com/31798555/131752175-1c99d56e-d344-46e1-b8ac-9c12bba0569a.png). This would be helpful later as we use more dispatched workflows (e.g., for bisect functionality) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64402 Reviewed By: malfet Differential Revision: D30734860 Pulled By: janeyx99 fbshipit-source-id: 220016716094666e9af836fcd716dd529cf23d8a	2021-09-02 16:24:05 -07:00
Garrett Cramer	b12f34e8c2	update rpc tensorpipe logic for sparse tensors (#62960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62960 A bug was filed a few years ago for sending sparse tensor over rpc #30807. This pr updates rpc/tensorpipe logic for CUDA sparse tensors. During the serialization process, the pickler.cpp implementation breaks down the sparse tensor into two tensors and metadata. torch/csrc/distributed/rpc/tensorpipe_agent.cpp needs to be updated because it does not have logic sparse tensors. It pushes a single device for a sparse tensor. This is wrong because after the sparse tensor has been serialized, there will be two tensors. The second tensor will not have a device. This will cause the second tensor to have the wrong target device. tensorpipe_utils.cpp needs to be updated because deserialization happens after the data is received on the target pipe. This takes the two tensors and metadata sent and rebuilds the sparse tensor. There will be two tpDescriptors but only one tensor after deserialization. The logic is updated to verify the sparse tensor is on the correct device using the first tpDescriptor. This pr also updates ivalue.cpp and ivalue.h to support more paths for Sparse COO tensors. I tested these changes by adding sparse tests to rpc_test.py and dist_autograd_test.py. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30717285 Pulled By: gcramer23 fbshipit-source-id: daee9a56764550f56b131f9dd8e74e23113d6714	2021-09-02 16:16:19 -07:00
Eli Uriegas	32a93c2424	Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing Test Plan: revert-hammer Differential Revision: D30675780 (`795387477f`) Original commit changeset: b2116b51dcc8 fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae	2021-09-02 16:07:29 -07:00
Zafar Takhirov	116142143c	[quant] Enable jit tracing on quantizable LSTM (#64438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64438 The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing. The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is `False`, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties). Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` Reviewed By: mtl67 Differential Revision: D30732630 fbshipit-source-id: 443e351ebb0e2b636c86dea9691b9bf42ffe618f	2021-09-02 15:59:20 -07:00
James Reed	795387477f	[FX] Prototype for guarding against mutable operations in tracing (#64295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30675780 Pulled By: jamesr66a fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a	2021-09-02 15:17:04 -07:00
Eli Uriegas	3c79e0b314	.github: Migrate pytorch_linux_bionic_py_3_6_clang9 to GHA (#64218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64218 Relies on https://github.com/fairinternal/pytorch-gha-infra/pull/11 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra bdhirsh Test Plan: Imported from OSS Reviewed By: malfet, H-Huang, janeyx99 Differential Revision: D30651516 Pulled By: seemethere fbshipit-source-id: e5843dfe84f096f2872d88f2e53e9408ad2fe399	2021-09-02 14:51:00 -07:00
Erjia Guan	257623da39	Switch Shuffler to use iter-local buffer (#64195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64195 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30642947 Pulled By: ejguan fbshipit-source-id: d4b52479b4ae37ad693388b9cdb8eed83a136474	2021-09-02 13:40:28 -07:00
Nikita Shulga	f555348aaa	Disable CircleCI ROCm build (#64434 ) Summary: Per jithunnair-amd suggestion Pull Request resolved: https://github.com/pytorch/pytorch/pull/64434 Reviewed By: seemethere, janeyx99 Differential Revision: D30732289 Pulled By: malfet fbshipit-source-id: 1932d0a7d1e648006f8030c8237b187d0709f688	2021-09-02 13:32:02 -07:00
Kevin Tse	4ce9c530d6	[DataPipe] removing filter's inheritance from map (#64404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64404 This PR remove `filter`'s inheritance from `map`. This allows `filter` to not have a `__len__` function and that behavior is what we would like. cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30713120 Pulled By: NivekT fbshipit-source-id: 4d5d07555297ee2bd4b49842c0d26cdc00638f6c	2021-09-02 13:09:47 -07:00
Kevin Tse	4f43480186	[DataPipe] adding/removing __len__ for different DataPipe (#64398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64398 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30710437 Pulled By: NivekT fbshipit-source-id: 524eda43a2faa0db0c1a662bf9bb4283f0ade83c	2021-09-02 13:08:32 -07:00
Erjia Guan	3cd0a4ac15	Fix test_ind_worker_queue by setting max_num_worker based on system resource (#63779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63779 Fixes #63657 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30494185 Pulled By: ejguan fbshipit-source-id: d1bd24299b25d589889604aaf18ad347bdff4df4	2021-09-02 12:36:56 -07:00
Thomas J. Fan	7d010539c9	ENH Adds test and docs for modules that already support no batch dims (#62729 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62729 Reviewed By: H-Huang Differential Revision: D30669546 Pulled By: jbschlosser fbshipit-source-id: c771c98c1fd9d28fa984b72893585c738c736505	2021-09-02 12:36:54 -07:00
Rohan Varma	d0cb26ba57	[DDP] Fix logging iterations (#64411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64411 These are not actually the training iterations, but are offset by how frequently DDP stats collection actually runs (default being kDDPRuntimeLoggingSampleRate = 100). So with this change, they are actually logged to scuba every: 10, 10 * 100, 40 * 100, etc iterations. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30718274 fbshipit-source-id: 146bd2428753c93363bee37e487f40104fce3c18	2021-09-02 12:35:01 -07:00
Eli Uriegas	22f3bcd164	.github: Move squid vars to common vars (#64436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64436 Moves the squid variables to our common jinja template so that when we have to update them they're all in the same place. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D30732776 Pulled By: seemethere fbshipit-source-id: 22e3757c4eec775baa8abbaac2ba2a0c69c2b2a9	2021-09-02 11:31:54 -07:00
Eli Uriegas	c932afe39b	.github: Move upload-artifact-s3 to common var (#64435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64435 Move upload-artifact-s3 to a common variable to be used amongst our jinja templates, this should make it easier in the future to update these images Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30732777 Pulled By: seemethere fbshipit-source-id: 51cd485f5abae134c3c49dfa878e6303ba8e5f25	2021-09-02 11:31:52 -07:00
Richard Zou	1519b6084f	nn.functional.linear OpInfo (#61971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61971 Test Plan: - wait for tests Reviewed By: heitorschueroff Differential Revision: D30013750 Pulled By: zou3519 fbshipit-source-id: ca41dbd98176c12e50ad1410a658f4b06fe99a1e	2021-09-02 11:31:50 -07:00
Eli Uriegas	c0cdbb1cc5	Revert D30468409: Add fx2trt pass for removing duplicate output args Test Plan: revert-hammer Differential Revision: D30468409 (`6da7552a8e`) Original commit changeset: b4d91b76ab5d fbshipit-source-id: e138dc425fe55ffe3585ea5fac4db476931bafed	2021-09-02 11:31:49 -07:00
Hui Guo	9214450b7f	[tensorexpr] Wrap error msgs with buildErrorMessages for internal asserts (#64409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64409 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30717786 Pulled By: huiguoo fbshipit-source-id: a3b147d339ff4927f14efa24407cd3b63d80001d	2021-09-02 11:30:34 -07:00
Kefei Lu	6da7552a8e	Add fx2trt pass for removing duplicate output args (#64433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64433 Fx2TRT does not support duplicate nodes in the output args tuple. This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets. This pass will change both the subnets and top level module. Test Plan: Run: ``` buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args ``` Reviewed By: 842974287 Differential Revision: D30468409 fbshipit-source-id: b4d91b76ab5d8a5275d68dd48d1327a44c22568e	2021-09-02 10:40:37 -07:00
Jane Xu	aeafcde087	CI: Enable using labels to control GHA workflows (#64314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62852 Sets a global environment variable containing a list of PR labels. For this PR, the PR_LABELS variable looks like: ``` [ "cla signed", "ciflow/default" ] ``` confirmed in a run: https://github.com/pytorch/pytorch/runs/3490072161?check_suite_focus=true This information can be used in other workflow steps to control the logic. For example, if I want to force a build, I can label my PR with "force-build" and do something like the following in my build script: ``` if [[ "${PR_LABELS}" = force-build ]]; then python setup.py install else #use cached wheel or something fi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64314 Reviewed By: driazati Differential Revision: D30714570 Pulled By: janeyx99 fbshipit-source-id: 80b060ee32643ddd22eb7b8ec548579c7ccf6441	2021-09-02 10:34:44 -07:00
Nicolas Hug	66ddc6ef9e	Fixes and details to torchhub docs (#63783 ) Summary: This PR: - adds a few details regarding the newly added `skip_validation` parameter https://github.com/pytorch/pytorch/pull/62139 - uses double-backticks instead of single-backticks since this is rst, not mardown. - adds a few minor doc nits here and there Pull Request resolved: https://github.com/pytorch/pytorch/pull/63783 Reviewed By: zou3519 Differential Revision: D30696658 Pulled By: NicolasHug fbshipit-source-id: 6f01c7eb3cfcd7e17e4c33c09d193054fa18ad36	2021-09-02 09:32:57 -07:00
Thomas J. Fan	50067c020a	TST Adds __repr__ and str to module info (#63737 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds `test_repr` to `test_modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63737 Reviewed By: gchanan Differential Revision: D30729642 Pulled By: jbschlosser fbshipit-source-id: c11a28bc0739abd3ed40727389dd28ed4069edad	2021-09-02 09:32:55 -07:00
Zhaoheng Ni	2c258d91cc	Fix torch.istft length mismatch and window runtime error (#63469 ) Summary: The PR fixes two issues: - See https://github.com/pytorch/pytorch/issues/62747 and https://github.com/pytorch/audio/issues/1409. The length mismatch when the given ``length`` parameter is longer than expected. Add padding logic in consistent with librosa. - See https://github.com/pytorch/pytorch/issues/62323. The current implementations checks if the min value of window_envelop.abs() is greater than zero. In librosa they normalize the signal on non-zero values by indexing. Like ``` approx_nonzero_indices = ifft_window_sum > util.tiny(ifft_window_sum) y[approx_nonzero_indices] /= ifft_window_sum[approx_nonzero_indices] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63469 Reviewed By: fmassa Differential Revision: D30695827 Pulled By: nateanl fbshipit-source-id: d034e53f0d65b3fd1dbd150c9c5acf3faf25a164	2021-09-02 09:31:47 -07:00
Mike Iovine	616fd9219d	[Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209 Add a new fusion pass that turns transforms the following pattern: ``` graph(%input): %0 : Tensor = aten::sign(%input) %1 : Tensor = aten::abs(%input) %2 : Tensor = aten::log1p(%1) %res : Tensor = aten::mul(%0, %2) return (%res) ``` Into a single op: ``` graph(%input): %res : Tensor = static_runtim::signed_log1p(%input) return (%res) ``` The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff does not enable this pass. Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Test passed with new graph pass disabled and enabled. Reviewed By: hlu1 Differential Revision: D30559929 fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a	2021-09-02 08:31:40 -07:00
CodemodService FBSourceClangFormatLinterBot	cd3be4675f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30710635 fbshipit-source-id: e8dae05a7e3a19d656067a4f102aab4a3c93ac42	2021-09-02 08:31:37 -07:00
Seth Elliott	f04e6594ed	Fix broken caffe2 test: PlanExecutorTest.BlockingErrorPlan (#64401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64401 PlanExecutorTest.BlockingErrorPlan uses `ASSERT_DEATH` which internally performs a `fork()`. This can cause problems under certain configurations that use threads. This change updates this test to use the "threadsafe" style for GTest death tests in order to improve its quality in multithreaded environments. Test Plan: I confirmed that this change fixes the issue on my devvm with the following command: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest.BlockingErrorPlan ``` Reviewed By: praihan Differential Revision: D30709447 fbshipit-source-id: 12ffd9ad0371e2e5b43a9873c80568e5ab02d246	2021-09-02 08:30:29 -07:00
Michael Dagitses	b737629ff0	simplify op name determination into a single forward pass (#64261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261 Note that this does not preserve byte-for-byte compatibility with existing names. Test Plan: * Rely on CI to catch gross errors. * Merge after release cut to catch subtle issues. Reviewed By: albanD Differential Revision: D30700647 Pulled By: dagitses fbshipit-source-id: 7b02f34b8fae3041240cc78fbc6bcae498c3acd4	2021-09-02 07:32:11 -07:00
Vasiliy Kuznetsov	b2c7c1dfcf	fix copy.deepcopy on LinearPackedParams (#64367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64367 This is the same thing as https://github.com/pytorch/pytorch/pull/56154 but for quantized linear. It fixes the behavior of `copy.deepcopy` on these modules. Before this PR, copied instances of `LinearPackedParams` were not properly initialized, and inspecting them raised errors of missing `_modules`. After this PR, inspecting and using the copies works. Test Plan: ``` python test/test_quantization.py TestStaticQuantizedModule.test_linear_api ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30702667 fbshipit-source-id: 38c26d1e72663416eeb989985b77ffc2052c12b9	2021-09-02 06:30:42 -07:00
Ivan Kobzarev	99b064fac4	[jit] shape propagation for prepack (#63585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63585 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30428905 Pulled By: IvanKobzarev fbshipit-source-id: c18f6605a69b2e000bdf14a23e637c5a1c2ec64c	2021-09-02 05:30:38 -07:00
Michael Dagitses	cdb46f4c6e	extract TestAutogradComplex into its own test file (#63400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400 This is the first step to break up test_autograd.py for #63205. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30541499 Pulled By: dagitses fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01	2021-09-02 04:34:35 -07:00
Michael Dagitses	be5b05c1dc	require that `TARGET_DET_LIST` is sorted (and sort it here) (#64102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64102 We sort this list so that we may add comments to indicate the absence of a file right where that file would need to be put. This makes it difficult to wrongly add such a file. The sorting itself was done programmatically to ensure that no entries were inadvertently removed. I printed the sorted list with: ``` for p in sorted(TARGET_DET_LIST): print(f' "{p}",') ``` Then copied it back into the file. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30625076 Pulled By: dagitses fbshipit-source-id: cf36fcb3e53e274b76d1f4aae83da1f53c03f9ed	2021-09-02 04:34:33 -07:00
Nicolas Hug	aedd70fcfe	Fix list() and help() torchhub functions for Windows (#63773 ) Summary: This PR Fixes the help() and list() torchhub functions which were probably failing for Windows since the `/` OS separator was hardcoded. Before merging this I need to double check whether the CI actually runs the corresponding tests on Windows or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/63773 Reviewed By: zou3519 Differential Revision: D30695664 Pulled By: NicolasHug fbshipit-source-id: fac328163fd05db804a8186ae28f22b3cc3a6404	2021-09-02 04:34:31 -07:00
Nicolas Hug	030154e241	Remove outdated comment in hub.py (#63757 ) Summary: This PR removes an outdated comment about Python2 that was orginally introduced in https://github.com/pytorch/pytorch/pull/25083/files. The code has changed since then, but the comment wasn't removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63757 Reviewed By: zou3519 Differential Revision: D30695656 Pulled By: NicolasHug fbshipit-source-id: 431cf414588b9e5a1ad6acdae724ff5af1b16971	2021-09-02 04:34:29 -07:00
Nicolas Hug	1c735768ed	Update hub.load() signature to avoid polluting kwargs param (#63755 ) Summary: This PR addresses an old comment about Python2 EOL, directly putting some parameters in the function signature instead of in a `**kargs` dict. I believe the changes are fully backward compatible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63755 Reviewed By: zou3519 Differential Revision: D30695634 Pulled By: NicolasHug fbshipit-source-id: 398f347c5a04bfb58e77e46773a869cb9d0eb225	2021-09-02 04:32:22 -07:00
Kefei Lu	6db8f7a709	Fix TRTModule not adding outputs in order (#64418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418 In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply. Test Plan: * Arc lint * Run CI/sandcastle tests * Run GPU lowering using commands and code changes in D30171741 and ensure we don't observe out-of-order outputs Reviewed By: yinghai Differential Revision: D30693545 fbshipit-source-id: 32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba	2021-09-02 01:36:23 -07:00
Kushashwa Ravi Shrimali	76e187aa08	Port `gather` to structured kernel (#63312 ) Summary: Will add a description once this is ready for review. cc: ysiraichi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312 Reviewed By: iramazanli Differential Revision: D30597447 Pulled By: ezyang fbshipit-source-id: d36e59835c2f4b38e286032dd2a1111a7e16b7e5	2021-09-02 01:36:21 -07:00
Pavel Belevich	ee8a6c1d14	Replace std::unordered_map<c10::Device, c10::Device> with DeviceMap (#64393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64393 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30708384 Pulled By: pbelevich fbshipit-source-id: 1c565727e4f09cd9e560874dd90aa403470b4a97	2021-09-02 01:36:19 -07:00
Chen Lai	8d5b95019d	[PyTorch Edge] Support default args with out arg, flag off (#63540 ) Summary: 1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag. 2. Add two unittests to cover this type of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540 ghstack-source-id: 137211562 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg ``` Reviewed By: raziel, iseeyuan, tugsbayasgalan Differential Revision: D30414156 fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f	2021-09-02 01:36:16 -07:00
Edward Yang	0addd75be9	Remove unnecessary resize_output (#64272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64272 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang, bdhirsh Differential Revision: D30686941 Pulled By: ezyang fbshipit-source-id: de60e6f1115648f8cf7daaa1e652594fe8b06742	2021-09-02 01:34:17 -07:00
Shirong Wu	69e1207084	Move graph util to fx2trt (#64064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64064 Move original util in torch2trt to fx2trt dir since torch2trt is gonne be deprecated. This is a follow up diff for D30379124 Test Plan: manual Reviewed By: yinghai, mikekgfb Differential Revision: D30591687 fbshipit-source-id: ae0e59dfbc2d2e2aa4f3ccea7cff2291c7deb388	2021-09-01 22:34:11 -07:00
Edward Yang	71e149834b	Add a warning about DataLoader num_workers > 0 "memory leak" (#64337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64337 See https://github.com/pytorch/pytorch/issues/13246 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30690320 Pulled By: ezyang fbshipit-source-id: 2751aca05a94e63d25162599f458855988516fad	2021-09-01 21:49:41 -07:00
Rohan Varma	d067f15622	[Dist CI] Move rest of distributed tests to their own CI job (#64253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253 Follow up to D30496178 (`f4aff3a346`) to move the rest of distributed tests to their own jobs for Linux GHA. ghstack-source-id: 137233785 Test Plan: CI Reviewed By: walterddr Differential Revision: D30662999 fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6	2021-09-01 21:43:41 -07:00
Rohan Varma	4d6314a16e	[DDP] Log num threads (#64072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072 Log gloo threads to DDP logging. ghstack-source-id: 137119480 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30596083 fbshipit-source-id: 2b4f6e762cb5d850be6056bcc5922029a1af3c91	2021-09-01 18:36:15 -07:00
Zeina Migeed	59c6ceb6a8	add documentation to shape inference algorithm (#64312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64312 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30709254 Pulled By: migeed-z fbshipit-source-id: 3297d26fe6727c5b9ca176625b1683d787f59659	2021-09-01 18:34:17 -07:00
Yi Wang	778af56504	[DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352 as title ghstack-source-id: 137246253 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30694089 fbshipit-source-id: a78110b11d59bb0718f43c99ede23f2fd8ab21d0	2021-09-01 17:37:19 -07:00
Yi Wang	bf9d66586c	[DDP Comm Hook] Create a noop hook for performance debugging (#64344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344 As title. Additionally, avoid using numpy array in test_ddp_hooks.py. ghstack-source-id: 137170449 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook Reviewed By: rohan-varma Differential Revision: D30693220 fbshipit-source-id: e17f0d1c6198863cf20a53566f586a6bff602522	2021-09-01 17:36:22 -07:00
Rohan Varma	baceea4426	[DDP] Add more logging iterations (#64071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071 Adding more logging iterations to get additional data. ghstack-source-id: 137119476 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30579367 fbshipit-source-id: 57195266ada5e5926f0d8eaf4fb4e01dc98924d7	2021-09-01 17:32:17 -07:00
Rohan Varma	59fcbd172b	Fix incorrect DDP test (#64074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64074 Previous PR https://github.com/pytorch/pytorch/pull/63831 did not actually test the error in https://github.com/pytorch/pytorch/issues/63812. Introduce a test directly from the repro that simulates it. ghstack-source-id: 137171460 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30569719 fbshipit-source-id: fd61250ef6d291c093607663d91d6d2cb5574eb7	2021-09-01 16:34:06 -07:00
Rohan Varma	9b8f9d5a25	[c10d] Prefer use of torch_check (#63928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63928 throw std::invalid_argument results in not getting stacktraces with TORCH_SHOW_CPP_STACKTRACES=1, so instead prefer torch_check here. ghstack-source-id: 137135328 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30533955 fbshipit-source-id: 33e5bf4f449e3043dec68da93f8022f6624d9675	2021-09-01 16:34:05 -07:00
anjali411	5d80a48cef	Add fast path for addmm when the inputs are conjugate (#59380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28898374 Pulled By: anjali411 fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04	2021-09-01 16:34:02 -07:00
Yi Wang	a8f9aab840	[DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346 as title ghstack-source-id: 137170288 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30693513 fbshipit-source-id: 8c64b8404ff3b0322e1bbbd93f6ef051ea91307d	2021-09-01 16:34:00 -07:00
Jerry Zhang	ed89937d2c	[quant][graphmode][fx] Add fbgemm backend_config_dict (#64288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288 This is just to setup the file structure and unblock experimentation. The format for backend_config_dict will change in the future Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: zou3519 Differential Revision: D30699457 fbshipit-source-id: 28211a4def05d34757850c045a36e311f54760fe	2021-09-01 16:32:43 -07:00
Santiago Castro	69f4401b7b	Make datasets in `ConcatDataset` not need to be sized (#64114 ) Summary: `datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114 Reviewed By: H-Huang Differential Revision: D30641480 Pulled By: ejguan fbshipit-source-id: 7e16548c2123afa65b83845f9929271fa07fe1e8	2021-09-01 15:32:50 -07:00
Richard Zou	535526b95c	Restore LayerNorm numerics test (#64385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385 It was deleted in https://github.com/pytorch/pytorch/pull/63276. The numerics test was meant to check LayerNorm behavior on large inputs, but we deleted it without realizing that. Test Plan: - wait for tests. Reviewed By: ngimel Differential Revision: D30702950 Pulled By: zou3519 fbshipit-source-id: a480e26c45ec38fb628938b70416cdb22d976a46	2021-09-01 15:32:49 -07:00
Jerry Zhang	7ffcf15503	[quant][graphmode][api] Add backend_config_dict to prepare_fx api (#64135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135 We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict` to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: zou3519 Differential Revision: D30699456 fbshipit-source-id: e3c068da8d3da2270f57719f7159cc71cafa8598	2021-09-01 15:32:47 -07:00
zhouzhuojie	93bc03622e	Silent rm error for sccache log file (#64388 ) Summary: Sample reporting from dr.ci ![image](https://user-images.githubusercontent.com/658840/131724645-75afa04f-7554-4674-8e7c-cf139c84d994.png) The `rm` command is not actually running into problems, just need to silent the console output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64388 Reviewed By: walterddr, malfet, seemethere Differential Revision: D30704439 Pulled By: zhouzhuojie fbshipit-source-id: ecd35531decf05b75cef30d08d46635f81112f67	2021-09-01 15:32:45 -07:00
Yuchen Huang	9495674905	[xplat][metal] Add getters and setters for ivars in Conv2dOpContext (#57395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57395 As title ghstack-source-id: 137223806 (Note: this ignores all push blocking failures!) Test Plan: ### Lib Build - `buck build caffe2:aten_metal_prepack` ### Integration Test - `arc focus2 pp-ops -a ModelRunner` - Click "Test Person/Hair Segmentation Model" {F612831435} - Image Classification Demo {F614144868} Reviewed By: xta0 Differential Revision: D28132020 fbshipit-source-id: 73560263a9d14e9ecfa39c69deb158a2ed8cb179	2021-09-01 15:31:12 -07:00
Meghan Lele	968d7ee46a	[structured] Preserve computed elements from meta func to impl (#61746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746 Summary This commit introduces a new feature for structured kernels that allows kernels to declare quantities as "precomputed" in `native_functions.yaml`, compute them once in the `meta` function and reuse them again in the `impl`. The names and types of these quantities are used to generate code for a struct containing them that the `meta` function must return. In the case of a handful of surveyed kernels (`all,`, `any`, `avg_pool2d`), these quantities that are used both in the `meta` and `impl` have the same meaning as certain kernel arguments and in fact supersede them. Accordingly, the correspondence between a kernel argument and the precomputed elements that supersede it is also captured in `native_functions.yaml`. This information is used to unpack the struct returned by `meta` and pass its contents correctly to the `impl` function. The primary goal is to avoid recompute and enhance developer experience (e.g. sometimes people can forget to compute these elements while porting a kernel). Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D30407831 Pulled By: SplitInfinity fbshipit-source-id: 00975525ea373721fe52d06f75cd4ac91f3dc556	2021-09-01 14:34:25 -07:00
Mike Iovine	4aad366111	[Static Runtime] Make per-op latency readable by FAI-PEP (#64315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64315 Add a new flag `generate_ai_pep_output` to `StaticRuntime::benchmark`. If set, produces per-op-kind average total latency in milliseconds in a JSON format recognized by [Facebook AI performance evaluation platform (FAI-PEP)](https://github.com/facebook/FAI-PEP). This is useful for observing the impact of changes that make a big difference for a specific op, but do not affect the overall SR latency by more than a few percent. Reviewed By: hlu1 Differential Revision: D30679352 fbshipit-source-id: c847fa6ea20774aaf1e7949b11db4421d1f70b7e	2021-09-01 14:34:22 -07:00
Salil Desai	86c9654291	Update optimize_for_mobile to preserve node's debug information (#63106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63106 Propagate debug info to the re-written nodes in the graph. Test Plan: - Clone open source repo and build - ``` python3 test/test_jit.py TestOptimizeForMobilePreserveDebugInfo ``` - Tests pass Reviewed By: kimishpatel Differential Revision: D28654659 fbshipit-source-id: 2d7c87f2fb95a3be53246375f35639bbd97c237e	2021-09-01 14:34:20 -07:00
David Reiss	15ff25d1fc	Break up "@generated" string so Phabricator shows changes Summary: Created from CodeHub with https://fburl.com/edit-in-codehub Test Plan: CI Sandcastle run Reviewed By: larryliu0820 Differential Revision: D30701781 fbshipit-source-id: 3acab8b65a327c4ec7da90bc855ecf02f801c40a	2021-09-01 14:34:18 -07:00
Alban Desmaison	e322547fe6	Add forward AD support for custom Functions (#64061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64061 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30640868 Pulled By: albanD fbshipit-source-id: b0e6610430a879074d6d5306443772fc154b431f	2021-09-01 14:33:09 -07:00
Tanvir Zaman	25e2578967	Fix bytes_written and bytes_read (#64244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040 In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug. We would instead use the size in bytes based on actual data type. Test Plan: Added unit tests BatchMatMulMemCostTest: buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest Extended existing unit test test_columnwise_concat for different data types: buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat Reviewed By: CrazySherman Differential Revision: D30656698 fbshipit-source-id: d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf	2021-09-01 13:35:41 -07:00
Scott Wolchok	03a58a2ba0	[Caffe2] Create fewer strings during argument fetching (#64285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285 With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`. ghstack-source-id: 137139818 ghstack-source-id: 137139818 Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway. Reviewed By: dzhulgakov Differential Revision: D26826676 fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6	2021-09-01 13:30:54 -07:00
Kimish Patel	468001600c	Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307 Original commit changeset: 0b2aa7c57d08 Restores original changes. This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: chrome trace generation. operator level memory profiling (to be added) flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp. Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30680354 fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9	2021-09-01 13:29:35 -07:00
Rohan Varma	421d8f86b6	Add a record scope around autograd::engine::evaluate_function (#63619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619 Adds a RECORD_FUNCTION with the function that is being valuate as part of backwards execution. This has been useful in picking up some operations in the backwards pass that otherwise would not show up, for example custom cpp functions that use custom C++ code. ghstack-source-id: 137041723 Test Plan: CI benchmark: buck run mode/opt //scripts/rvarm1/ddp:bench Reviewed By: albanD Differential Revision: D30439492 fbshipit-source-id: 955917770cdf2a2edb0303223ace710b668ba388	2021-09-01 12:32:30 -07:00
Patrick Kan	0b48d96895	[Bootcamp] Include both python unittest and parser parameters in --help and -h flag (#64297 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45945 Creates a new thread to run -h or --help with unittest.main if the help flag is present, and keeps the add_help default for parameters. Includes both python unittest and parser parameters in --help and -h flag and will remain up to date since both messages are displayed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64297 Test Plan: Imported from GitHub `python test/test_spectral_ops.py --help` Output: ``` % python test/test_spectral_ops.py --help usage: test_spectral_ops.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: test_spectral_ops.py - run default set of tests test_spectral_ops.py MyTestSuite - run suite 'MyTestSuite' test_spectral_ops.py MyTestCase.testSomething - run MyTestCase.testSomething test_spectral_ops.py MyTestCase - run all 'test*' test methods in MyTestCase usage: test_spectral_ops.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] ``` Also ran some other tests to make sure tests still worked, and other tests with --help or -h flag Reviewed By: seemethere Differential Revision: D30677776 Pulled By: PatrickKan fbshipit-source-id: eb3d6e3fa677137ec703ec3a23808efb99acc896	2021-09-01 12:30:47 -07:00
Patrick Hu	c6505cc383	[FX] Fix python code generation for wrapped getattr() with default value (#64271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64271 Closes #60417 Modified emit_node() in fx/graph.py to generate getattr() call with default value when len(node.args) != 2 instead of accessing the attribute. Added test_torch_fx_getattr() in test/test_fx.py. Test Plan: pytest test/test_fx.py Imported from OSS Reviewed By: jamesr66a Differential Revision: D30671265 fbshipit-source-id: f2db9ea47e0cb247547e200684f715aab006c374	2021-09-01 11:30:27 -07:00
Raghavan Raman	87d8ab6e50	[nnc] Updated generic error message with info about turning off the fuser (#64316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64316 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30683942 Pulled By: navahgar fbshipit-source-id: d86607563672213f99a1436dcf4f5dc28053b713	2021-09-01 10:31:50 -07:00
Xiang Gao	c4f3f6e62d	Fixes reduction launch config (#64304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48573 See also https://github.com/pytorch/pytorch/pull/64194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64304 Reviewed By: janeyx99 Differential Revision: D30689600 Pulled By: ngimel fbshipit-source-id: bf2103ca177fd3b6e27bc0324b81925234483a29	2021-09-01 10:30:40 -07:00
Kushashwa Ravi Shrimali	d5bfdd3dac	OpInfo for `nn.functional.layer_norm` (#63276 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note: * This PR also adds a reference test inspired by existing tests in `test_nn.py`. cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276 Reviewed By: ejguan Differential Revision: D30452483 Pulled By: zou3519 fbshipit-source-id: 2578d01ca34e031668a41bd284db60c31ae1fba8	2021-09-01 09:31:45 -07:00
Nima Elyasi	d1f3d85fd8	fix GradBucket.is_last() logic (#63768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63768 passed number of buckets to GradBucket constructor, to check if index is equal to num_buckets - 1 in the .is_last() function. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks test output: https://www.internalfb.com/intern/testinfra/testconsole/testrun/8162774375985873/ Reviewed By: SciPioneer, mrshenli Differential Revision: D30455913 fbshipit-source-id: 8c67ca69cbf191d6e189e09248407eb167bb24b6	2021-09-01 09:29:13 -07:00
Richard Zou	92b31b59af	Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] Test Plan: revert-hammer Differential Revision: D29699456 (`ad4848565e`) Original commit changeset: 407ae53392ac fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de	2021-09-01 07:32:24 -07:00
James Reed	0c4e4e588e	[FX] Rename reduce functions back to their old, public names (#64324 ) Summary: Unfortunately pickle serializes the names of these functions. Also put them under backward-compatibility enforcement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64324 Test Plan: Local repro https://fb.workplace.com/groups/3440841732711443/permalink/4018921611570116/ Reviewed By: SplitInfinity, TailofJune Differential Revision: D30684185 Pulled By: jamesr66a fbshipit-source-id: 900701220155d15115cd0c07cf7774a2891bd04f	2021-08-31 22:36:11 -07:00
Yuchen Huang	05ecaefbbf	[Metal][GPU] Enable metal for simulators and fix test failures if possible (#64322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64322 As title ghstack-source-id: 137143877 Test Plan: - `aibench-cli mobile` - Select iOS -> `y` -> `1` -> `n` -> "--metal_op_test" - Select all iPhone 6 + iPhone 7 + iPhone 8 and a iPhone X or 11 or 12 ``` Benchmark Submitted. Find more details at: https://our.intern.facebook.com/intern/aibench/details/318120612514604 Benchmark Status: D10 (`b8256280ce`)AP-12.0.1: DONE N71mAP-14.3: DONE DUMMY latency: D10 (`b8256280ce`)AP-12.0.1: 4319.3 N71mAP-14.3: 8868.51 I0831 16:06:27.210558 605277 ClientSingletonManager.cpp:99] Shutting down Manifold ClientSingletonManager ``` Reviewed By: xta0 Differential Revision: D30147163 fbshipit-source-id: 2de6bbd9bd525e32ca92b2845eb435800855edcc	2021-08-31 22:36:09 -07:00
Michael Carilli	24e50b8453	[CUDA graphs] hotfix for test_graph_ (#64339 ) Summary: Graphed workloads that try to capture a full backward pass must do warmup on a non-default stream. If warmup happens on the default stream, AccumulateGrad functions might tag themselves to run on the default stream, and therefore won't be capturable. ngimel and I suspect some test_cuda.py tests run with the default stream as the ambient stream, which breaks `test_graph_grad_scaling` because `test_graph_grad_scaling` does warmup on the ambient stream _assuming_ the ambient stream is a non-default stream. This PR explicitly sets a side stream for the warmup in `test_graph_grad_scaling`, which is what I should have done all along because it's what the new documentation recommends. I pushed the PR branch straight to the main pytorch repo because we need to run ci-all on it, and I'm not sure what the requirements are these days. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64339 Reviewed By: mruberry Differential Revision: D30690711 Pulled By: ngimel fbshipit-source-id: 91ad75f46a11f311e25bc468ea184e22acdcc25a	2021-08-31 22:34:10 -07:00
gmagogsfm	479fc4e412	Remove outdated warning about RecursiveScriptModule not being copiable (#64085 ) Summary: RecursiveScriptModule has its customized `__copy__` and `__deepcopy__` defined. The warning/error that says it is not copiable is outdated Pull Request resolved: https://github.com/pytorch/pytorch/pull/64085 Reviewed By: rohan-varma Differential Revision: D30598623 Pulled By: gmagogsfm fbshipit-source-id: 0701d8617f42d818bc7b88244caee4cd47fbe976	2021-08-31 21:31:32 -07:00
Mikhail Zolotukhin	8337a3fb3f	[TensorExpr] Wrap error messages with buildErrorMessage call. (#64330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30687226 Pulled By: ZolotukhinM fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3	2021-08-31 20:31:16 -07:00
Pritam Damania	a87808de93	Fix bug in ShardedTensorMetadata serde. (#63902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63902 The 'memory_format' field was not being serialized correctly and used the same encoding for different fields. ghstack-source-id: 137142406 Test Plan: waitforbuildbot Reviewed By: bowangbj Differential Revision: D30527324 fbshipit-source-id: f0f223e2d660ef6e4abae9649d9992acc36e1278	2021-08-31 20:31:14 -07:00
Pavel Belevich	fa5676a41b	Delete some dead code from RRefMessageBase (#64298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64298 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30676702 Pulled By: pbelevich fbshipit-source-id: 77dbc0f8064c3518376454ff573d45ed0274956b	2021-08-31 20:30:04 -07:00
Matti Picus	6bb4b5d150	disallow empty named dims list to flatten(names, name) (#61953 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61137 by raising an error if an empty tuple is passed in for the names: ``` >>> torch.empty((2, 3), names=['a', 'b']).flatten((), 'abc') RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty ``` or from the original issue: ``` >>> torch.empty((2, 3)).flatten((), 'abc') RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61953 Reviewed By: iramazanli Differential Revision: D30574571 Pulled By: malfet fbshipit-source-id: e606e84458a8dd66e5da6d0eb1a260f37b4ce91b	2021-08-31 19:32:30 -07:00
Scott Wolchok	c59970db6b	[caffe2][easy] Save heap allocation in ConcatOp (#63529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63529 Output() takes an IntArrayRef, so we can just use a std::initializer_list (stack-allocated array) instead of std::vector here. ghstack-source-id: 137085908 Test Plan: existing CI Reviewed By: mruberry Differential Revision: D29687400 fbshipit-source-id: 9f2a7c6679f2552c098bb1bf7befaca18e0e5d4d	2021-08-31 18:33:32 -07:00
Edward Yang	b23e4f6086	Convert mul to use opmath_gpu_kernel_with_scalars (#64019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64019 Note that previously the functor operated on scalar_t and this modifies it to operate on opmath_t, but this is not a problem as half precision was implemented by performing the compute in float anyway. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30575282 Pulled By: ezyang fbshipit-source-id: cc6900ef996e755740afe48f9cb4d0366858dd47	2021-08-31 18:33:30 -07:00
soulitzer	0733582087	Use the correct overloaded name to skip boxed autograd not implemented kernel registration (#64182 ) Summary: Some internal use_count tests are failing for `dequantize_self` because we only compare the skip list with the base name `dequantize` when we should be comparing with the full name including the overload Pull Request resolved: https://github.com/pytorch/pytorch/pull/64182 Reviewed By: albanD Differential Revision: D30639909 Pulled By: soulitzer fbshipit-source-id: d4d22dd1a5c8f7180251ce7739830764cce6f151	2021-08-31 18:33:28 -07:00
Ray Peng	09e610e36d	[Static Runtime] Out version for softmax (#64243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64243 Test Plan: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ... V0830 16:35:22.524479 613839 impl.cpp:1410] Switch to out variant for node: %5 : Tensor = aten::softmax(%a.1, %dim.1, %dtype.1) ... [ OK ] StaticRuntime.IndividualOps_Softmax (803 ms) ``` Reviewed By: hlu1 Differential Revision: D30656149 fbshipit-source-id: 115b7b4a75448fd6a5c526808080ca9a4251302c	2021-08-31 18:33:26 -07:00
Eli Uriegas	0b9cdeb295	.circleci: Remove already migrated CUDA configs (#64231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64231 This migrates over the CUDA 11.1 and CUDA 10.2 configs that we had previously migrated to GHA Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D30683811 Pulled By: seemethere fbshipit-source-id: 71b0761461557d871c26eb02f665a2e4d9b1d9fb	2021-08-31 18:33:24 -07:00
Eli Uriegas	23da90ab84	.github: Consolidate linux setup / teardown (#64229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64229 Consolidates linux setup / teardown into easy to use jinja2 macros Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: zhouzhuojie, driazati Differential Revision: D30683810 Pulled By: seemethere fbshipit-source-id: 2578630df3e212fb79392a699090553baef44cc2	2021-08-31 18:31:48 -07:00
Nikita Shulga	5ecb966e0f	Add ciflow-tracking issue to pytorch-probot (#64125 ) Summary: Doesn't do anything yet... Pull Request resolved: https://github.com/pytorch/pytorch/pull/64125 Reviewed By: zhouzhuojie Differential Revision: D30620283 Pulled By: malfet fbshipit-source-id: 91869d35c1b70a55e32261d2c32fb0136ec33960	2021-08-31 17:38:34 -07:00
Mikhail Zolotukhin	9e25634833	[TensorExpr] Move declaration of buildErrorMessage to exception.h (#64301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64301 Test Plan: Imported from OSS Reviewed By: navahgar, huiguoo Differential Revision: D30678215 Pulled By: ZolotukhinM fbshipit-source-id: 599c83b3890450a0fb6526815f037eec9563661c	2021-08-31 17:37:29 -07:00
Jay Leverett	44fcb00a56	Fix redundant class definition in GraphModule singleton constructor (#64274 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64274 Reviewed By: jamesr66a Differential Revision: D30675970 Pulled By: jayleverett fbshipit-source-id: e74ef2a28013f0fa7c58d14f38e66cfe48d26b74	2021-08-31 17:34:14 -07:00
Nikita Shulga	c2da103fe6	Discover new tests in run_tests.py (#64246 ) Summary: Introduce `discover_tests` function that globs for all Python files starting with `test_` in test folder excluding subfolders which are executed differently Fixes https://github.com/pytorch/pytorch/issues/64178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246 Reviewed By: walterddr, seemethere Differential Revision: D30661652 Pulled By: malfet fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9	2021-08-31 17:32:55 -07:00
Richard Zou	0457a85d45	Revert D30543236: Add python mode Test Plan: revert-hammer Differential Revision: D30543236 (`4bd03b0242`) Original commit changeset: ef5444d96a5a fbshipit-source-id: b0042ac2c22765fa11d6d00bf751f6a4489eb6d8	2021-08-31 15:28:33 -07:00
Kevin Tse	6c8cb9bd76	[DataPipe] export fork, mux, demux for public usage (#64279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64279 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30671971 Pulled By: NivekT fbshipit-source-id: 056ac12ef7183b254d1eec341145594639e47ef6	2021-08-31 14:34:30 -07:00
Kevin Tse	491bf7cb74	[DataPipe] adding description, __len__, tests for mux() (#64224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64224 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30651551 Pulled By: NivekT fbshipit-source-id: f8af98ba71a592900b992a8077432062ec57bb48	2021-08-31 14:34:28 -07:00
zhouzhuojie	9a0456939b	Try the forked checkout action with retry (#64120 ) Summary: Fixes #{issue number} The main difference is: `ffc6f93ad4` Can test multiple times in this PR to see if it works, will make the `retry` number configurable if it's usable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64120 Reviewed By: malfet Differential Revision: D30656099 Pulled By: zhouzhuojie fbshipit-source-id: a89932196bb0c44e412a34664ed6a061b02ef92e	2021-08-31 14:34:26 -07:00
Rishi Puri	13484084a6	fix syntax error in bfloat16 PR (#64122 ) Summary: fixes prior syntax error from PR ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64122 Reviewed By: H-Huang Differential Revision: D30643596 Pulled By: ngimel fbshipit-source-id: 0a2d5a40fb6dc7339cd03112e57ef0e1bf8a000e	2021-08-31 14:33:12 -07:00
Michael Carilli	8d08b103be	[CUDA graphs] Prototype API and documentation (#63269 ) Summary: RFC: https://github.com/pytorch/pytorch/issues/61880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269 Reviewed By: mruberry Differential Revision: D30596643 Pulled By: ngimel fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be	2021-08-31 13:34:23 -07:00
Rohan Varma	1c2b5e59ae	Remove ref to test_distributed_fork (#64197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197 Removes this line as test is gone. ghstack-source-id: 136986275 Test Plan: CI Reviewed By: walterddr Differential Revision: D30642929 fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c	2021-08-31 13:31:27 -07:00
Eli Uriegas	555171a273	.circleci: Remove migrated jobs, move docs builds (#64222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64222 Removes both backwards_compat as well as docs_test from the general gcc5.4 config and moves the docs build from being run on every PR to only being run on master. We can remove docs builds when we migrate the docs push job (including all secrets associated with that) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30650953 Pulled By: seemethere fbshipit-source-id: ac11da6a551a6c81f3dc1d47fd81846cbfe9975a	2021-08-31 13:30:13 -07:00
Raghuraman Krishnamoorthi	347ef69529	[ao][docs] Clarify operator support for quantization (#63270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63270 Add table to quantization main page showing supported modules for static and dynamic quantization. ghstack-source-id: 137087204 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D30658654 fbshipit-source-id: a82c998e1db6370596d5b0ca4c7cc96c1c90f30e	2021-08-31 12:32:47 -07:00
Vasiliy Kuznetsov	3a46edb8d8	ns for fx: make layer types more readable (#64270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64270 Before this PR, layer types were populated by doing `str(module_instance)` and `str(function)`. This resulted in moderately readable strings for modules, and poorly readable strings for functions. This PR switches the logic to use `torch.typename` utility instead. The results are significantly more readable. Example function type: ``` # before '<built-in method linear of PyCapsule object at 0x7fe9b20ce7b0>' # after 'torch._ops.quantized.PyCapsule.linear' ``` Example module type: ``` # before "<class 'torch.nn.quantized.modules.conv.Conv2d'>" # after 'torch.nn.quantized.modules.conv.Conv2d' ``` Test Plan: Manually inspect NS results for modules and functions, verify they are more readable. Manually inspect NS results for modules and functions, verify they are more readable. Imported from OSS Differential Revision: D30669545 D30669545 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 60959e5cafa0a4992b083bf99f5d8260f9acdac0	2021-08-31 12:31:34 -07:00
Shiyan Deng	845bc89811	[fx2trt] Add acc_ops.sign and converter for it (#63876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63876 Add `acc_ops.sign` which maps from `torch.sign`. Add a plugin (not support dynamic shape currently) for `acc_ops.sign`. The plugin calls `at::sign` directly. Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 caffe2/torch/fb/fx2trt:test_unary_ops Reviewed By: yinghai Differential Revision: D30518081 fbshipit-source-id: a0b9e6c30deac0b04b8cb09a162579e229985330	2021-08-31 11:31:53 -07:00
Saketh Are	83e28a7d28	Use stacklevel for floordiv deprecation warnings (#64034 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60548 `Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback. Repro: ``` import torch x = torch.tensor(0) x // 1 ``` Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide: ``` UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.) return torch.floor_divide(self, other) ``` After this change, the warning instead cites the user's offending line of Python code: ``` UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). x // 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034 Reviewed By: mruberry Differential Revision: D30658010 Pulled By: saketh-are fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a	2021-08-31 11:27:56 -07:00
Raghuraman Krishnamoorthi	b9275a4003	[ao][docs] Add description of qconfig and qengine to quantization page (#63582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63582 Current quantization docs do not define qconfig and qengine. Added text to define these concepts before they are used. ghstack-source-id: 137051719 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D30658656 fbshipit-source-id: a45a0fcdf685ca1c3f5c3506337246a430f8f506	2021-08-31 10:33:07 -07:00
Kushashwa Ravi Shrimali	ca8dd296ee	Add OpInfo for `nn.functional.cosine_similarity` (#62959 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Notes: * Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well. * Broadcasting is also checked in the OpInfo for `cosine_similarity`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959 Reviewed By: heitorschueroff Differential Revision: D30520176 Pulled By: zou3519 fbshipit-source-id: 14e902eb4bcce875edab28a1669a2ea021052b9b	2021-08-31 10:31:36 -07:00
Kevin Tse	0ef8760bf6	[DataPipe] implementing __len__ for fork (no valid length for demux) (#64215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64215 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30648672 Pulled By: NivekT fbshipit-source-id: 4780f2f6a79ae15a4009092475e7d92f96dd09a2	2021-08-31 08:32:31 -07:00
Kevin Tse	0deb7a0bc0	[DataPipe] implementing demux() (#63650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63650 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30493944 Pulled By: NivekT fbshipit-source-id: 0aa06dee8c7fb1744975b8f6a0694b90c11ef80d	2021-08-31 08:32:29 -07:00
Kevin Tse	eee054e6ea	[DataPipe] implementing fork() (#63649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63649 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30493945 Pulled By: NivekT fbshipit-source-id: 40db7d4134facd266d86bc0dc2edf2729c4e5842	2021-08-31 08:32:27 -07:00
Kimish Patel	67cb131458	Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. Test Plan: revert-hammer Differential Revision: D30327514 (`bc9277dca3`) Original commit changeset: 3bb2f2daaaed fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64	2021-08-31 08:30:36 -07:00
Harut Movsisyan	3c15822f5f	[Static Runtime] Implement aten::nonzero out variant (#64126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64126 Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30617729 fbshipit-source-id: 752749638c8f467815efa57021cb3de5c728ab1b	2021-08-31 00:51:15 -07:00
Facebook Community Bot	a3d6dae319	Automated submodule update: FBGEMM (#64213 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9d69998df6` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64213 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30647878 fbshipit-source-id: b903b39441b4e28dda7eab226ac874e2227e750a	2021-08-30 21:33:17 -07:00
Kimish Patel	bc9277dca3	[Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367 This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: - chrome trace generation. - operator level memory profiling (to be added) - flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30327514 fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f	2021-08-30 20:54:51 -07:00
Peter Bell	7ca4728e6d	Compile BatchLinearAlgebra without nvcc (#64146 ) Summary: These files only use cuda libraries interfaces, so don't actually need to be compiled with nvcc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64146 Reviewed By: ezyang Differential Revision: D30633189 Pulled By: ngimel fbshipit-source-id: c9d0ae5259a10cb49332d31f0da89ad758736ea8	2021-08-30 20:18:21 -07:00
Bert Maher	e7fb35021a	[nnc] Enable fusion of bfloat16 ops (#64196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30643864 Pulled By: bertmaher fbshipit-source-id: e95edeaf7089464d713ea1d1f951743d3e5f61c5	2021-08-30 20:09:36 -07:00
James Reed	538647fe1f	[WIP][FX] BC guarantees for 1.10 (#63888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63888 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30523133 Pulled By: jamesr66a fbshipit-source-id: b04cc0d842a74862f42ecba98b757310cd2ec7b0	2021-08-30 19:56:46 -07:00
leslie-fang-intel	09dfaa0339	add operation list for AutocastCPU (#63534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534 In this PR: * We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002` * We also update the operation list which needs casting to `lower_precision_fp` or `float32`. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30644914 Pulled By: ezyang fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1	2021-08-30 19:30:33 -07:00
oleshp	93f1090267	Update contribution_guide.rst (#64142 ) Summary: Grammatical update. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/64142 Reviewed By: mruberry Differential Revision: D30639394 Pulled By: ezyang fbshipit-source-id: cf1a4dfbd8e34b0772f1b09f5d820278e8ef8574	2021-08-30 19:26:59 -07:00
Santiago Castro	6b85c99ce5	Avoid an unnecessary list creation in `DataChunk` (#64111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64111 Reviewed By: mruberry Differential Revision: D30639383 Pulled By: ezyang fbshipit-source-id: 96b243307413c99a67d55d862a71937e1ef210f4	2021-08-30 19:25:42 -07:00
Samantha Andow	c7c711bfb8	Add optional tensor arguments to (#63967 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63435 Adds optional tensor arguments to check handling torch function checks. The only one I didn't do this for in the functional file was `multi_head_attention_forward` since that already took care of some optional tensor arguments but not others so it seemed like arguments were specifically chosen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63967 Reviewed By: albanD Differential Revision: D30640441 Pulled By: ezyang fbshipit-source-id: 5ef9554d2fb6c14779f8f45542ab435fb49e5d0f	2021-08-30 19:21:26 -07:00
CaoE	cb7cf823b3	add BFloat16 support for fold and unfold on CPU (#62880 ) Summary: Add BFloat16 support for fold and unfold operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880 Reviewed By: iramazanli Differential Revision: D30576387 Pulled By: zou3519 fbshipit-source-id: c48f6e56702bfea34448db1b3a1634c49c5d8ec8	2021-08-30 19:14:10 -07:00
Edward Yang	ffc2612087	Add acc_gpu_kernel_with_scalars and port add to use it (#63884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63884 See https://dev-discuss.pytorch.org/t/cuda-loops-case-study-code-generation-vs-templates/302 for explanation of what's going on here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30545296 Pulled By: ezyang fbshipit-source-id: f0da52153ae63599fe1d57e90e73f50ca2116939	2021-08-30 19:10:16 -07:00
Erjia Guan	a49907f984	Modify inline doc for DataPipe (#64221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64221 List of tasks in this PR - [x] Add inline doc for DataPipe - [x] Improve the inline doc - [x] Expose DataPipe to `datapipes.iter` (`UnBatcher`) Note: `Forker`, `Demux`, `Mux` are exposed in another PR authored by Kevin - [x] Add correct typing to DataPipe - [x] Unify the argument to `datapipe` rather than `source_datapipe` Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30650541 Pulled By: ejguan fbshipit-source-id: c09d1b9742b8097d8e645c15947cef80c876877b	2021-08-30 18:45:46 -07:00
Erjia Guan	af85bc5ffd	Replace group_by_key by group_by IterDataPipe (#64220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64220 Remove `ByKeyGrouperIterDataPipe` due to duplicated functionality. Fix a bug in `GrouperIterDataPipe` using the existing test. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30650542 Pulled By: ejguan fbshipit-source-id: 666b4d28282fb4f49f3ff101b8d08be16a50d836	2021-08-30 18:45:44 -07:00
Richard Zou	4bd03b0242	Add python mode (#63496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: malfet, albanD Differential Revision: D30543236 Pulled By: zou3519 fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452	2021-08-30 18:44:35 -07:00
Bert Maher	ebc0aacf83	[nnc] Fix half2float conversion and re-enable float16 (#64199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30643865 Pulled By: bertmaher fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b	2021-08-30 18:37:55 -07:00
Harut Movsisyan	1f16c22dc8	[Static Runtime] Implement aten::cumsum out variant (#64159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30622819 fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736	2021-08-30 16:18:22 -07:00
Richard Zou	5401159b8f	OpInfo for nn.functional.interpolate (#61956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61956 Each mode goes through a different implementation so they are listed as different variants. Test Plan: - run tests Reviewed By: malfet Differential Revision: D30013751 Pulled By: zou3519 fbshipit-source-id: 4253b40b55667d7486ef2d98b441c13d807ab292	2021-08-30 16:00:43 -07:00
Thomas J. Fan	a7ae73a238	BUG Fixes regression for nllloss gradcheck (#64203 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64163 This PR includes the fix and the opinfo from https://github.com/pytorch/pytorch/pull/63854/ for non-regression testing. cc albanD mruberry jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/64203 Reviewed By: albanD Differential Revision: D30647522 Pulled By: jbschlosser fbshipit-source-id: 2974d299763505908fa93532aca2bd5d5b71f2e9	2021-08-30 15:13:09 -07:00
Ivan Yashchuk	ad4848565e	Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D29699456 Pulled By: cpuhrsch fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b	2021-08-30 15:06:25 -07:00
Alban Desmaison	c3464e78a4	Revert D30561459: Fix bytes_written and bytes_read Test Plan: revert-hammer Differential Revision: D30561459 (`e98173ff34`) Original commit changeset: 976fa5167097 fbshipit-source-id: 43f4c234ca400820fe6db5b4f37a25e14dc4b0dd	2021-08-30 14:59:54 -07:00
Alban Desmaison	e4fd2ab59c	Back out "Added reference tests to ReductionOpInfo" (#64183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64183 Original commit changeset: 6a1f82ac2819 Test Plan: CI Reviewed By: soulitzer Differential Revision: D30639835 fbshipit-source-id: e238043c6fbd0453317a9ed219e348298f98aaca	2021-08-30 14:48:10 -07:00
Jerry Zhang	8f88f797db	[quant][graphmode][fx] Add reference quantized conv module (#63828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63828 Added reference quantized conv module for the custom backend flow, the reference quantized module will have the following code: ``` w(float) -- quant - dequant \ x(float) ------------- F.conv2d --- ``` In the full model, we will see ``` w(float) -- quant - dequant \ x -- quant --- dequant -- F.conv2d --- quant - dequant ``` and the backend should be able to fuse the ops with `*` into a quantized linear Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference Imported from OSS Reviewed By: vkuzo Differential Revision: D30504749 fbshipit-source-id: e1d8c43a0e0d6d9ea2375b8ca59a9c0f455514fb	2021-08-30 14:23:17 -07:00
Daya Khudia	65050ec924	Back out "[JIT] Add aten::slice optimization" Summary: Original commit changeset: d12ee39f6828 build-break overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: dskhudia Test Plan: Local run succeeds Differential Revision: D30633990 fbshipit-source-id: 91cf7cc0ad7e47d919347c2a1527688e062e0c62	2021-08-30 14:05:04 -07:00
Eli Uriegas	09e53c0cfe	.github: Adding configuration for backwards_compat (#64204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64204 Adds backwards_compat to our existing test matrix for github actions Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30646764 Pulled By: seemethere fbshipit-source-id: f0da6027e29fab03aff058cb13466fae5dcf3678	2021-08-30 13:59:00 -07:00
Eli Uriegas	9035a1cb4d	.github: Adding configuration for docs_test (#64201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64201 Adds docs_test to our existing test matrix for github actions Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30646765 Pulled By: seemethere fbshipit-source-id: 946adae01ff1f1f7ebe626e408e161b77b19a011	2021-08-30 13:57:20 -07:00
Will Constable	85df73658c	Make name() part of IMethod interface (#63995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995 JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod. Test Plan: add case to imethod API test Reviewed By: suo Differential Revision: D30559401 fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c	2021-08-30 13:31:55 -07:00
Nikita Shulga	b9933f08b9	Fix type annotation in tools/nightly.py (#64202 ) Summary: `tempfile.TemporaryDirectory` is a generic only in python-3.9 and above Workaround by wrapping type annotation in quotes Fixes https://github.com/pytorch/pytorch/issues/64017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64202 Reviewed By: janeyx99 Differential Revision: D30644215 Pulled By: malfet fbshipit-source-id: 3c16240b9fa899bd4d572c1732a7d87d3dd0fbd5	2021-08-30 13:27:43 -07:00
lezcano	f3e329cbec	Implements the orthogonal parametrization (#62089 ) Summary: Implements an orthogonal / unitary parametrisation. It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function. Fixes https://github.com/pytorch/pytorch/issues/42243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089 Reviewed By: ezyang Differential Revision: D30639063 Pulled By: albanD fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6	2021-08-30 13:12:07 -07:00
Tanvir Zaman	e98173ff34	Fix bytes_written and bytes_read (#64040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040 In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug. We would instead use the size in bytes based on actual data type. Test Plan: Added unit tests BatchMatMulMemCostTest: buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest Extended existing unit test test_columnwise_concat for different data types: buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat Differential Revision: D30561459 fbshipit-source-id: 976fa5167097a35af548498480001aafd7851d93	2021-08-30 12:57:31 -07:00
Philip Meier	eafe33c995	remove componentwise comparison of complex values in torch.testing.assert_close (#63841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63841 Closes #61906. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30633526 Pulled By: mruberry fbshipit-source-id: ddb5d61838cd1e12d19d0093799e827344382cdc	2021-08-30 12:38:44 -07:00
Philip Meier	401bbb2aa0	remove componentwise comparison of complex values in TestCase.assertEqual (#63572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572 Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30633527 Pulled By: mruberry fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c	2021-08-30 12:36:45 -07:00
Xiang Gao	a8ffe81b2c	Bring back old algorithm for sorting on small number of segments (#64127 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63456 The code was copy-pasted from the previous commit without modification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64127 Reviewed By: mruberry Differential Revision: D30632090 Pulled By: ngimel fbshipit-source-id: 58bbdd9b0423f01d4e65e2ec925ad9a3f88efc9b	2021-08-30 12:30:50 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
Peter Bell	5b0dfd0f8a	Fix bad use of channels last kernel in sync batch norm backward (#64100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64039 There are two distinct problems here. 1. If `grad_output` is channels last but not input, then input would be read as-if it were channels last. So reading the wrong values. 2. `use_channels_last_kernels` doesn't guarunte that `suggest_memory_format` will actually return channels last, so use `empty_like` instead so the strides always match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64100 Reviewed By: mruberry Differential Revision: D30622127 Pulled By: ngimel fbshipit-source-id: e28cc57215596817f1432fcdd6c49d69acfedcf2	2021-08-30 12:16:30 -07:00
Zhengxu Chen	ac99d63f83	[jit] Make operation call accept Stack& instead Stack* (#63414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414 Misuse of raw pointer in here where stack is never nullable. ghstack-source-id: 136938318 Test Plan: compiles. Imported from OSS Reviewed By: ejguan Differential Revision: D30375410 fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee	2021-08-30 11:49:20 -07:00
=	93d2e5090f	Improve performance of index_select by avoiding item (#63008 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/61788 From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once. From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer. Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008 Reviewed By: zou3519 Differential Revision: D30211160 Pulled By: cpuhrsch fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd	2021-08-30 09:50:41 -07:00
Harut Movsisyan	e24c3644d8	[Static Runtime] aten::cat out version when it is not being replaced by prim::VarConcat (#64157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157 UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR. Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30598574 fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a	2021-08-30 09:42:38 -07:00
Scott Wolchok	16ecdbbaa2	[PyTorch] Fix missing move in unpickler (#63974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63974 Saw some time spent in this for model loading, no reason not to move here. ghstack-source-id: 136760979 Test Plan: Re-profile model loading on devserver; IValue copy ctor time has gone down Reviewed By: dhruvbird Differential Revision: D30548923 fbshipit-source-id: 42000f2e18582762b43353cca10ae094833de3b3	2021-08-30 09:38:55 -07:00
Scott Wolchok	9777887f0e	[PyTorch] Reduce copies/refcount bumps in BytecodeDeserializer::parseMethods (#63961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63961 Saw a report that this function was slow and was doing unexplained vector copies. First pass to remove a bunch of copying. ghstack-source-id: 136760976 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/461850118893980 after: https://www.internalfb.com/intern/aibench/details/48965886029524 MilanBoard failed to return data from simpleperf Reviewed By: dhruvbird Differential Revision: D30544551 fbshipit-source-id: 0e2b5471a10c0803d52c923e6fb5625f5542b99d	2021-08-30 09:37:10 -07:00
Raghavan Raman	dc4fd3bdda	[MicroBench] Added a micro benchmark for a signed log1p kernel. (#64032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64032 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30579198 Pulled By: navahgar fbshipit-source-id: a53d68225fba768b26491d14b535f8f2dcf50c0e	2021-08-30 09:27:51 -07:00
Facebook Community Bot	f79df24859	Automated submodule update: FBGEMM (#64149 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f6dfed87a1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64149 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30632209 fbshipit-source-id: aa1cebaf50169c3a93dbcb994fa47e29d6b6a0d7	2021-08-30 08:30:57 -07:00
Vitaly Fedyunin	82174330d0	[DataLoader2] Adding Messages, Protocols, Loop wrappers (#63882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63882 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30627452 Pulled By: VitalyFedyunin fbshipit-source-id: 561ea2df07f3572e04401171946154024126387b	2021-08-30 07:57:20 -07:00
Rong Rong (AI Infra)	7701ea48be	remove one more distributed test (#64108 ) Summary: Follow up on https://github.com/pytorch/pytorch/issues/62896. one more place we should remove distributed test Pull Request resolved: https://github.com/pytorch/pytorch/pull/64108 Reviewed By: janeyx99, soulitzer Differential Revision: D30614062 Pulled By: walterddr fbshipit-source-id: 6576415dc2d481d65419da19c5aa0afc37a86cff	2021-08-30 07:51:11 -07:00
Raghavan Raman	093a12aaa9	[nnc] Updated internal asserts to include more detailed error messages (#64118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64118 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30616944 Pulled By: navahgar fbshipit-source-id: 35289696cc0e7faa01599304243b86f0febc6daf	2021-08-30 04:40:51 -07:00
Raghavan Raman	a836d83957	[nnc] Fixed warning due to implicit parameter conversion (#64117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64117 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30616945 Pulled By: navahgar fbshipit-source-id: eaf69232ac4a684ab5f97a54a514971655f86ef3	2021-08-30 04:39:34 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
Harut Movsisyan	8af1407eab	[Static Runtime] Out version for torch.linalg.norm (#64070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30595816 fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563	2021-08-29 21:00:11 -07:00
Zafar Takhirov	44e3ed88c9	[quant] AO migration of the `quantize.py` (#64086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/opt //caffe2/test:quantization` Reviewed By: jerryzh168, raghuramank100 Differential Revision: D30055886 fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f	2021-08-29 20:30:01 -07:00
Mike Ruberry	29ad84f252	Removes beta warning from the special module documentation (#64148 ) Summary: Updates documentation per feature review. torch.special is now stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64148 Reviewed By: ngimel Differential Revision: D30632049 Pulled By: mruberry fbshipit-source-id: 8f6148ec7737e7b3a90644eeca23eb217eda513d	2021-08-29 19:38:46 -07:00
mingfeima	c5ed31e4a7	add channel last support for MaxUnpool2d (#49984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49984 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007051 Pulled By: VitalyFedyunin fbshipit-source-id: 6c54751ade4092e03c1651aaa60380f7d6e92f6b	2021-08-29 18:37:10 -07:00
Nikita Shulga	9db56531f7	Revert D30620966: [pytorch][PR] Move Parallel[Native\|TBB] to GHA Test Plan: revert-hammer Differential Revision: D30620966 (`223f886032`) Original commit changeset: 9a23e4b3e168 fbshipit-source-id: b9248d377b9a7b850dfb3f10f3350fbc9855acfe	2021-08-29 15:51:27 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	710a2e933f	[DOC] Add doc for maybe_wrap_dim (#63161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63161 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629451 Pulled By: tugsbayasgalan fbshipit-source-id: b03f030f197e10393a8ff223b240d23c30858028	2021-08-29 14:19:28 -07:00
Garrett Cramer	7ebdbf82dc	add support for sending cpu sparse tensors over rpc (#62794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62794 This pr updates jit serialization to support pickling Sparse COO tensors. This pr updates message.cpp to support Sparse COO tensors. A bug was filed a few years ago https://github.com/pytorch/pytorch/issues/30807. I tested the fix by adding sparse tensor tests to rpc_test.py and dist_autograd_test.py. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 gmagogsfm Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30608848 Pulled By: gcramer23 fbshipit-source-id: 629ba8e4a3d8365875a709c9b87447c7a71204fb	2021-08-29 11:35:00 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	52d7dd7398	[DOC] improve docstring for Optimizer.state_dict (#63153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153 Fixes: https://github.com/pytorch/pytorch/issues/60121 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629462 Pulled By: tugsbayasgalan fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f	2021-08-29 10:20:58 -07:00
Facebook Community Bot	371c6612b3	Automated submodule update: FBGEMM (#64141 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9939bac9de` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64141 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30629417 fbshipit-source-id: 1b1ad3d4caff925f798b86b358ab193554c9b8e0	2021-08-29 09:58:04 -07:00
Bert Maher	2e6221a232	[nnc] Make 64-bit dimensions work (#64077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077 We were assuming kernel dimensions fit in 32 bits (the old fuser made this assumption too), but we should be able to support 64. ghstack-source-id: 136933272 Test Plan: unit tests; new IR level test with huge sizes Reviewed By: ZolotukhinM Differential Revision: D30596689 fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94	2021-08-28 19:59:47 -07:00
Bert Maher	405c15516c	Parse int64 sizes/strides (#64076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64076 We were parsing sizes into int32s, so if you had a tensor with more than 2^32 elements, you couldn't represent it. ghstack-source-id: 136933273 Test Plan: parseIR with size of 4e9 Reviewed By: ZolotukhinM Differential Revision: D30521116 fbshipit-source-id: 1e28e462cba52d648e0e2acb4e234d86aae25a3e	2021-08-28 19:58:34 -07:00
Bert Maher	4f969db325	[nnc] Fix batchnorm implementation (#64112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64112 Fixes #64062 Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D30622897 Pulled By: bertmaher fbshipit-source-id: 7d7c6131aa786e61fa1d0a517288396a0bdb1d22	2021-08-28 19:20:35 -07:00
Ilqar Ramazanli	aefa2f3e64	To add RMSProp algorithm documentation (#63721 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of RMSProp to the documentation. For more details, we refer to the paper https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf <img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721 Reviewed By: albanD Differential Revision: D30612426 Pulled By: iramazanli fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398	2021-08-28 15:55:56 -07:00
Facebook Community Bot	8b6266fe4f	Automated submodule update: FBGEMM (#64129 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f14e794814` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64129 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30621549 fbshipit-source-id: 34c109e75c96a261bf370f7a06dbb8b9004860ab	2021-08-28 11:56:17 -07:00
Nikita Shulga	223f886032	Move Parallel[Native\|TBB] to GHA (#64123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64123 Reviewed By: driazati Differential Revision: D30620966 Pulled By: malfet fbshipit-source-id: 9a23e4b3e16870f77bf18df4370cd468603d592d	2021-08-28 11:50:30 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d0c63e857d	Enhancement for smart serialization for out schemas (#63096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30415255 Pulled By: tugsbayasgalan fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc	2021-08-28 11:46:27 -07:00
Priya Ramani	f4496528e3	[Light] Fix error message (#64010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64010 Fixing typos in a error message Test Plan: Error message before fix: Lite Interpreter verson number does not match. The model version must be between 3 and 5But the model version is 6 Error message after fix: Lite Interpreter version number does not match. The model version must be between 3 and 5 but the model version is 6 Reviewed By: larryliu0820 Differential Revision: D30568367 fbshipit-source-id: 205f3278ee8dcf38579dbb828580a9e986ccacc1	2021-08-27 22:54:38 -07:00
Jerry Zhang	0d0605eaa9	[quant][graphmode][fx] Add reference quantized linear module (#63627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63627 Added reference quantized linear module for the custom backend flow, the reference quantized module will have the following code: ``` w(float) -- quant - dequant \ x(float) ------------- F.linear --- ``` In the full model, we will see ``` w(float) -- quant - dequant \ x -- quant --- dequant -- F.linear --- quant - dequant ``` and the backend should be able to fuse the ops with `*` into a quantized linear Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference Imported from OSS Reviewed By: vkuzo Differential Revision: D30504750 fbshipit-source-id: 5729921745c2b6a0fb344efc3689f3b170e89500	2021-08-27 22:53:24 -07:00
Yuchen Huang	a3a7a67048	[iOS][GPU] Consolidate array and non-array kernel for hardswish (#63369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63369 ghstack-source-id: 136918152 (Note: this ignores all push blocking failures!) Test Plan: - `buck test pp-macos` - Op tests in PyTorchPlayground app - Run mobilenetv3 test https://pxl.cl/1Ncls Reviewed By: xta0 Differential Revision: D30354454 fbshipit-source-id: 88bf4f8b5871e63170161b3f3e44f99b8a3086c6	2021-08-27 19:31:08 -07:00
Ilqar Ramazanli	9ccb9299e0	To add Nesterov Adam algorithm description to documentation (#63793 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Nesterov Adam Algorithm to the documentation. For more details, we refer to the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ <img width="439" alt="NAdam" src="https://user-images.githubusercontent.com/73658284/131185124-e81b2edf-33d9-4a9d-a7bf-f7e5eea47d7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63793 Reviewed By: NivekT Differential Revision: D30617057 Pulled By: iramazanli fbshipit-source-id: cd2054b0d9b6883878be74576e86e307f32f1435	2021-08-27 19:29:34 -07:00
Mike Iovine	07c5cb8c48	[Static Runtime] Optimize memory planner initialization (#64101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101 Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference. There are two optimizations in this diff: * Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType` * Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs. Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: movefast1990 Differential Revision: D30595579 fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2	2021-08-27 17:40:43 -07:00
Mikhail Zolotukhin	2d75ab0c8f	[TensorExpr] Update tutorial. (#64109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30614050 Pulled By: ZolotukhinM fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2	2021-08-27 16:19:29 -07:00
Eli Uriegas	3abbcf079d	.github: Add cpp_docs job to current gcc5 workflow (#64044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64044 Adds the cpp_docs job to the current workflow, also modifies the scripts surrounding building docs so that they can be powered through environment variables with sane defaults rather than having to have passed arguments. Ideally should not break current jobs running in circleci but those should eventually be turned off anyways. Coincides with work from: * https://github.com/seemethere/upload-artifact-s3/pull/1 * https://github.com/seemethere/upload-artifact-s3/pull/2 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30610010 Pulled By: seemethere fbshipit-source-id: f67adeb1bd422bb9e24e0f1ec0098cf9c648f283	2021-08-27 16:06:12 -07:00
soulitzer	6ccb74b837	Update codegen to use boxed kernel (#63459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63459 - Replaces the usual registration basically when "requires_derivative" is True (as in we still need a grad_fn), but `fn.info` is `None` (TODO maybe make sure differentiable inputs > 0 also to match requires_derivative). - Adds some (temporary?) fixes to some sparse functions See: https://github.com/pytorch/pytorch/issues/63549 - To remove the codegen that generates NotImplemented node (though that should only be one line), because there are some ops listed under `RESET_GRAD_ACCUMULATOR` that have a extra function call. We would need to make this list of ops available to c++, but this would either mean we'd have to codegen a list of strings, or move the RESET_GRAD_ACCUMULATOR to cpp land. We could do this in a future PR if necessary. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518571 Pulled By: soulitzer fbshipit-source-id: 99a35cbced46292d1b4e51594ae4d534c2caf8b6	2021-08-27 15:01:50 -07:00
soulitzer	90a6498a12	Add autograd not implemented boxed fallback (#63458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458 See description and discussion from https://github.com/pytorch/pytorch/pull/62450 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518572 Pulled By: soulitzer fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882	2021-08-27 15:00:28 -07:00
Jessica Choi	8406dba65a	Removing references to ProcessGroupAgent in comments (#64051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64051 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30587076 Pulled By: jaceyca fbshipit-source-id: 414cb95faad0b4da0eaf2956c0668af057f93574	2021-08-27 14:47:37 -07:00
Erjia Guan	bdde898d9c	Add README to datapipes (#63982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63982 Add a readme to `datapipes` for developer. This is can be a replacement of https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes_tutorial_dev_loaders.ipynb After this PR is landed, the README.md will be added to PyTorch Wiki Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30554198 Pulled By: ejguan fbshipit-source-id: 6091aae8ef915c7c1f00fbf45619c86c9558d308	2021-08-27 14:17:08 -07:00
Vincent Phan	358c46f99e	Implement leaky relu op Summary: Implemented leaky relu op as per: https://www.internalfb.com/tasks/?t=97492679 Test Plan: buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" all tests pass, including new ones Reviewed By: SS-JIA Differential Revision: D30186225 fbshipit-source-id: fdb1f8f7b3a28b5504581822185c0475dcd53a3e	2021-08-27 13:52:49 -07:00
Patrick Hu	18cb3fc910	[FX] Validate data type of target on Node Construction (#64050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64050 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30585535 Pulled By: yqhu fbshipit-source-id: 96778a87e75f510b4ef42f0e5cf76b35b7b2f331	2021-08-27 13:40:57 -07:00
Ivan Yashchuk	ff4569ae29	Sparse CUDA: rename files .cu -> .cpp (#63894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63894 This PR introduces a few code structure changes. There is no need to use .cu extension for pure c++ code without cuda. Moved `s_addmm_out_csr_sparse_dense_cuda_worker` to a separate cpp file from cu file. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30548771 Pulled By: cpuhrsch fbshipit-source-id: 6f12d36e7e506d2fdbd57ef33eb73192177cd904	2021-08-27 13:22:54 -07:00
Scott Wolchok	8fc1064b7f	[PyTorch] Reduce code size of register_prim_ops.cpp (#61494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61494 Creating a constexpr array and then looping over it is much cheaper than emitting a function call per item. ghstack-source-id: 136639302 Test Plan: fitsships Buildsizebot some mobile apps to check size impact. Reviewed By: dhruvbird, iseeyuan Differential Revision: D29646977 fbshipit-source-id: 6144999f6acfc4e5dcd659845859702051344d88	2021-08-27 12:56:35 -07:00
Marjan Fariborz	6a76ee04de	Adding alltoall_single collective to collective quantization API (#63154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63154 The collective quantization API now supports alltoall, alltoall_single, and allscatter. The test is also included. ghstack-source-id: 136856877 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/quantization:DistQuantizationTests_nccl -- test_all_to_all_single_bfp16 Reviewed By: wanchaol Differential Revision: D30255251 fbshipit-source-id: 856f4fa12de104689a03a0c8dc9e3ecfd41cad29	2021-08-27 12:46:31 -07:00
albanD	04108592a3	New TLS to disable forward mode AD (#63117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63117 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388097 Pulled By: albanD fbshipit-source-id: f1bc777064645db1ff848bdd64af95bffb530984	2021-08-27 11:59:24 -07:00
Karen Zhou	6257f5b168	[pruner] add README to repo (#64099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64099 adding readme to pruner in OSS ghstack-source-id: 136867516 Test Plan: should not affect behavior Reviewed By: z-a-f Differential Revision: D30608045 fbshipit-source-id: 3e9899a853395b2e91e8a69a5d2ca5f3c2acc646	2021-08-27 11:52:59 -07:00
mrshenli	101a626330	Improve `distributed.get_rank()` API docstring (#63296 ) Summary: See discussion in https://pytorch.slack.com/archives/CBHSWPNM7/p1628792389008600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63296 Reviewed By: cbalioglu Differential Revision: D30332042 Pulled By: mrshenli fbshipit-source-id: 3a642fda2e106fd35b67709ed2adb60e408854c2	2021-08-27 11:34:55 -07:00
Joel Schlosser	196fd3ee7a	Modules note v2 (#63963 ) Summary: This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10. It adds the following: * Examples of using hooks * Examples of using apply() * Examples for ParameterList / ParameterDict * register_parameter() / register_buffer() usage * Discussion of train() / eval() modes * Distributed training overview / links * TorchScript overview / links * Quantization overview / links * FX overview / links * Parametrization overview / link to tutorial Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963 Reviewed By: albanD Differential Revision: D30606604 Pulled By: jbschlosser fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4	2021-08-27 11:30:18 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	19c1b45f25	Detect out argument in the schema (#62755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755 After this change, out argument can be checked by calling is_out() Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30415256 Pulled By: tugsbayasgalan fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d	2021-08-27 11:20:33 -07:00
Don Jang	9f1f22b9bc	[Static Runtime] Add out variant of quantized::embedding_bag_byte_prepack (#64081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64081 This change add an out variant of `quantized::embedding_bag_byte_prepack`. Test Plan: - Added `ShapeInferenceTest.QEmbeddingBagByteUnpack`. - Observed ``` V0824 13:38:49.723708 1322143 impl.cpp:1394] Switch to out variant for node: %2 : Tensor = quantized::embedding_bag_byte_prepack(%input) ``` Reviewed By: hlu1 Differential Revision: D30504216 fbshipit-source-id: 1d9d428e77a15bcc7da373d65e7ffabaf9c6caf2	2021-08-27 10:53:23 -07:00
BBuf	6ab3a21098	fix resize bug (#61166 ) Summary: I think the original intention here is to only take effect in the case of align_corners (because output_size = 1 and the divisor will be 0), but it affects non-align_corners too. For example: ```python input = torch.tensor( np.arange(1, 5, dtype=np.int32).reshape((1, 1, 2, 2)) ) m = torch.nn.Upsample(scale_factor=0.5, mode="bilinear") of_out = m(input) ``` The result we expect should be [[[[2.5]]]] but pytorch get [[[[1.0]]]] which is different from OpenCV and PIL, this pr try to fixed it。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61166 Reviewed By: malfet Differential Revision: D30543178 Pulled By: heitorschueroff fbshipit-source-id: 21a4035483981986b0ae4a401ef0efbc565ccaf1	2021-08-27 10:49:31 -07:00
Pierluigi Taddei	538c30a713	[caffe2] fixes to allow stricter compilation flag (#64016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64016 In order to increase the strictness of the compilation for some target depending on caffe2 we need to fix some errors uncovered when rising such flags. This change introduces the required override tokens for virtual destructors Test Plan: CI. Moreover targets depending on caffe2 using clang strict warnings now compile Reviewed By: kalman5 Differential Revision: D30541714 fbshipit-source-id: 564af31b4a9df3536d7d6f43ad29e1d0c7040551	2021-08-27 10:38:37 -07:00
Heitor Schueroff	eca87f729d	Added reference tests to ReductionOpInfo (#62900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62900 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30408815 Pulled By: heitorschueroff fbshipit-source-id: 6a1f82ac281920ff7405a42f46ccd796e60af9d6	2021-08-27 10:32:16 -07:00
Mike Iovine	babd449978	[JIT] Add aten::slice optimization (#63049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63049 Given a graph produced from a function like this: ``` def foo(): li = [1, 2, 3, 4, 5, 6] return li[0:2] ``` This pass produces a graph like this: ``` def foo(): li = [1, 2] return li ``` These changes are mostly adapted from https://github.com/pytorch/pytorch/pull/62297/ Test Plan: `buck test //caffe2/test:jit -- TestPeephole` Reviewed By: eellison Differential Revision: D30231044 fbshipit-source-id: d12ee39f68289a574f533041a5adb38b2f000dd5	2021-08-27 10:12:45 -07:00
Jonathan Chang	3abb606091	Add doc for nn.MultiMarginLoss (shape, example) (#63760 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63760 Reviewed By: malfet Differential Revision: D30541581 Pulled By: jbschlosser fbshipit-source-id: 99560641e614296645eb0e51999513f57dfcfa98	2021-08-27 09:51:05 -07:00
Peter Bell	a9983ac09c	Refactor structured set_output in Register{DispatchKey}.cpp (#62188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62188 These parts of the `set_output` code are identical for all operators in the kernel registration files. So, this moves them from being copied into every class to two helper functions at the top of the file. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29962045 Pulled By: albanD fbshipit-source-id: 753b8aac755f3c91b77ffa2c30a89ac91a84b7c4	2021-08-27 09:38:27 -07:00
Sergei Vorobev	f922b58b5f	[bazel] GPU-support: add @local_config_cuda and @cuda (#63604 ) Summary: ## Context We take the first step at tackling the GPU-bazel support by adding bazel external workspaces `local_config_cuda` and `cuda`, where the first one has some hardcoded values and lists of files, and the second one provides a nicer, high-level wrapper that maps into the already expected by pytorch bazel targets that are guarded with `if_cuda` macro. The prefix `local_config_` signifies the fact that we are breaking the bazel hermeticity philosophy by explicitly relaying on the CUDA installation that is present on the machine. ## Testing Notice an important scenario that is unlocked by this change: compilation of cpp code that depends on cuda libraries (i.e. cuda.h and so on). Before: ``` sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10 ERROR: /home/sergei.vorobev/src/pytorch4/tools/config/BUILD:12:1: no such package 'tools/toolchain': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package. - /home/sergei.vorobev/src/pytorch4/tools/toolchain and referenced by '//tools/config:cuda_enabled_and_capable' ERROR: While resolving configuration keys for //:c10: Analysis failed ERROR: Analysis of target '//:c10' failed; build aborted: Analysis failed INFO: Elapsed time: 0.259s INFO: 0 processes. FAILED: Build did NOT complete successfully (2 packages loaded, 2 targets configured) ``` After: ``` sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10 INFO: Analyzed target //:c10 (6 packages loaded, 246 targets configured). INFO: Found 1 target... Target //:c10 up-to-date: bazel-bin/libc10.lo bazel-bin/libc10.so INFO: Elapsed time: 0.617s, Critical Path: 0.04s INFO: 0 processes. INFO: Build completed successfully, 1 total action ``` The `//:c10` target is a good testing one for this, because it has such cases where the [glob is different](`075024b9a3/BUILD.bazel (L76-L81)`), based on do we compile for CUDA or not. ## What is out of scope of this PR This PR is a first in a series of providing the comprehensive GPU bazel build support. Namely, we don't tackle the [cu_library](`11a40ad915/tools/rules/cu.bzl (L2)`) implementation here. This would be a separate large chunk of work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63604 Reviewed By: soulitzer Differential Revision: D30442083 Pulled By: malfet fbshipit-source-id: b2a8e4f7e5a25a69b960a82d9e36ba568eb64595	2021-08-27 09:33:42 -07:00
Hanton Yang	22d38bd10d	[OSS] Enable Metal in PyTorch MacOS nightly builds (#63718 ) Summary: Build on https://github.com/pytorch/pytorch/pull/63825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718 Test Plan: 1.Add `ci/binaries` label to PR, so the CI will build those nightly builds 2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`: ``` ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build ci/circleci: binary_macos_conda_3_6_cpu_nightly_build ci/circleci: binary_macos_conda_3_7_cpu_nightly_build ci/circleci: binary_macos_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build ``` 3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html) (1) conda ``` conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2 ``` (2) wheel ``` pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl ``` Reviewed By: xta0 Differential Revision: D30593167 Pulled By: hanton fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785	2021-08-27 09:25:05 -07:00
Aswin Murali	a43e7a51d7	Adds return type annotation for fork_rng function (#63724 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63723 Since it's a generator function the type annotation shall be `Generator`. ![image](https://user-images.githubusercontent.com/47299190/130318830-29ef9529-0daa-463c-90b2-1b11f63ade8a.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/63724 Reviewed By: iramazanli Differential Revision: D30543098 Pulled By: heitorschueroff fbshipit-source-id: ebdd34749defe1e26c899146786a0357ab4b4b9b	2021-08-27 09:03:40 -07:00
gmagogsfm	ad8eddbd80	More robust check of whether a class is defined in torch (#64083 ) Summary: This would prevent bugs for classes that 1) Is defined in a module that happens to start with `torch`, say `torchvision` 2) Is defined in torch but with an import alias like `import torch as th` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64083 Reviewed By: soulitzer Differential Revision: D30598369 Pulled By: gmagogsfm fbshipit-source-id: 9d3a7135737b2339c9bd32195e4e69a9c07549d4	2021-08-27 08:55:35 -07:00
Harut Movsisyan	f2c47cf4db	[Static Runtime] Out version for fmod (#64046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64046 Test Plan: Confirm out variant is used: ``` > //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 V0826 23:31:30.321382 193428 impl.cpp:1395] Switch to out variant for node: %4 : Tensor = aten::fmod(%a.1, %b.1) ``` Reviewed By: mikeiovine Differential Revision: D30581228 fbshipit-source-id: dfab9a16ff8afd40b29338037769f938f154bf74	2021-08-27 03:05:06 -07:00
Don Jang	c90b3cb1da	[Static Runtime] Manage temporary Tensors for aten::layer_norm (#64078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078 This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime. Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it: ``` at::Tensor mean = create_empty_from({M}, X); at::Tensor rstd = create_empty_from({M}, X); ``` that the static runtime misses an opportunity to manage. This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors. Test Plan: - Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated. - Confirmed that the new op gets activated during testing: ``` V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3) ``` Reviewed By: hlu1 Differential Revision: D30486475 fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d	2021-08-27 02:44:43 -07:00
Hao Lu	3c3bba4169	[Static Runtime] Use F14FastMap/F14FastSet (#63999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63999 Use folly::F14FastMap/F14FastSet instead of std::unordered_map/unordered_set in the Static Runtime code base. folly::F14FastMap/F14FastSet implements the same APIs as std::unordered_map/unordered_set but faster. For details see https://github.com/facebook/folly/blob/master/folly/container/F14.md Reviewed By: d1jang Differential Revision: D30566149 fbshipit-source-id: 20a7fa2519e4dde96fb3fc61ef6c92bf6d759383	2021-08-27 01:40:41 -07:00
Ansha Yu	3f1c809470	[static runtime] port c2 argmin kernel (#63632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63632 Local benchmarking with 1 input repeated 10k iter on 290331537_4 local net. Reduces argmin runtime by about 80% and and local net execution by about ~0.71-0.77ms. Before: ``` I0826 17:25:53.972786 1104614 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 7.37599. Iters per second: 135.57 ``` ``` Static runtime ms per iter: 8.22086. Iters per second: 121.642 Time per node type: 4.13527 ms. 50.9157%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.868506 ms. 10.6935%. aten::argmin (1 nodes, out variant) ... ``` After: ``` I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987 ``` ``` Static runtime ms per iter: 7.68172. Iters per second: 130.179 Time per node type: 4.1452 ms. 54.0612%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.656778 ms. 8.56562%. fb::quantized_linear (8 nodes) 0.488229 ms. 6.36741%. static_runtime::to_copy (827 nodes, out variant) 0.372678 ms. 4.86042%. aten::argmin (1 nodes, out variant) ...Time per node type: 3.39387 ms. 53.5467%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.636216 ms. 10.0379%. fb::quantized_linear (8 nodes, out variant) 0.410535 ms. 6.47721%. fb::clip_ranges_to_gather_to_offsets (304 nodes, out variant) 0.212721 ms. 3.3562%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (157 nodes, out variant) 0.173736 ms. 2.74111%. aten::matmul (1 nodes, out variant) 0.150514 ms. 2.37474%. aten::argmin (1 nodes, out variant) ``` P447422384 Test Plan: Test with local replayer sending traffic to `ansha_perf_test_0819.test`, and compare outputs to jit interpreter. Start compute tier: ``` RUN_UUID=ansha_perf_test_0819.test.storage JOB_EXPIRE_TIME=864000 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=405 PREDICTOR_TYPE=CPU ADDITIONAL_FLAGS="--enable_disagg_file_split=true --enable_adx=false --load_remote_file_locally=true --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_cpu_pyper SMC_TIER_NAME=sigrid.predictor.perf.ansha_per_test_0819.test.storage CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t6 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` Start nnpi tier: ``` RUN_UUID=ansha_perf_test_0819.test JOB_EXPIRE_TIME=247200 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=343 PREDICTOR_TYPE=NNPI_TWSHARED ADDITIONAL_FLAGS="--torch_glow_min_fusion_group_size=30 --pytorch_storage_tier_replayer_sr_connection_options=overall_timeout:1000000,processing_timeout:1000000 --predictor_storage_smc_tier=sigrid.predictor.perf.ansha_perf_test_0819.test.storage --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 SMC_TIER_NAME=sigrid.predictor.perf.ansha_perf_test_0819.test CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t17 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` ```buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_Argmin --print-passing-details``` Compared outputs to jit interpreter to check for no differences greater than 1e-3 (with nnc on) https://www.internalfb.com/intern/diff/view-version/136824794/ Reviewed By: hlu1 Differential Revision: D30445635 fbshipit-source-id: 048de8867ac72f764132295d1ebfa843cde2fa27	2021-08-26 23:19:19 -07:00
Supriya Rao	294db0603f	[quant] Add support for linear_relu fusion for FP16 dynamic quant (#63826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63826 Support the conversion of the intrinsic linearRelu module to the quantized dynamic LinearReLU module Verify the support works for both linear module and functional linear fusion Test Plan: python test/test_quantization.py test_dynamic_with_fusion Imported from OSS Reviewed By: iramazanli Differential Revision: D30503513 fbshipit-source-id: 70446797e9670dfef7341cba2047183d6f88b70f	2021-08-26 21:12:06 -07:00
Supriya Rao	cec44aa574	[quant] Add op support for linear_relu_dynamic_fp16 (#63824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63824 Add a fused operator implementation that will work with the quantization fusion APIs. Once FBGEMM FP16 kernel supports relu fusion natively we can remove the addition from the PT operator. Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30503514 fbshipit-source-id: 6bf3bd53f47ffaa3f1d178eaad8cc980a7f5258a	2021-08-26 21:12:04 -07:00
Supriya Rao	975f4ccad6	[quant] support linear_relu_dynamic for qnnpack backend (#63820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63820 Adds support in the operator directly to call relu operator if relu fusion is enabled. Once QNNPACK natively supports relu fusion in the linear_dynamic this can be removed Test Plan: python test/test_quantization.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Reviewed By: vkuzo Differential Revision: D30502813 fbshipit-source-id: 3352ee5f73e482b6d1941f389d720a461b84ba23	2021-08-26 21:12:02 -07:00
Supriya Rao	c7027f19ef	[quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799 Add a new module that can be used for module swap with the nni.LinearReLU module in convert function. Supports INT8 currently (since FP16 op doesn't have relu fusion yet). Fixes #55393 Test Plan: python test/test_quantization.py test_dynamic_fusion Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30502812 fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51	2021-08-26 21:10:46 -07:00
Michael Suo	63c90ec3bf	[torch/deploy] add torch.distributed to build (#63918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63918 Previously we were building with `USE_DISTRIBUTED` off, because c10d was built as a separately library for historical reasons. Since then, lw has merged the c10d build into libtorch, so this is fairly easy to turn on. Differential Revision: D30492442 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D30492442/)! D30492442 D30492442 Test Plan: added a unit test Reviewed By: wconstab Pulled By: suo fbshipit-source-id: 843b8fcf349a72a7f6fcbd1fcc8961268690fb8c	2021-08-26 20:58:44 -07:00
Can Balioglu	65e6194aeb	Introduce the torchrun entrypoint (#64049 ) Summary: This PR introduces a new `torchrun` entrypoint that simply "points" to `python -m torch.distributed.run`. It is shorter and less error-prone to type and gives a nicer syntax than a rather cryptic `python -m ...` command line. Along with the new entrypoint the documentation is also updated and places where `torch.distributed.run` are mentioned are replaced with `torchrun`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64049 Reviewed By: cbalioglu Differential Revision: D30584041 Pulled By: kiukchung fbshipit-source-id: d99db3b5d12e7bf9676bab70e680d4b88031ae2d	2021-08-26 20:17:48 -07:00
nikithamalgi	510d2ece81	Merge script and _script_pdt API (#62420 ) Summary: Merge `torch.jit.script` and `torch.jit._script_pdt` API. This PR merges profile directed typing with script api Pull Request resolved: https://github.com/pytorch/pytorch/pull/62420 Reviewed By: iramazanli Differential Revision: D30579015 Pulled By: nikithamalgifb fbshipit-source-id: 99ba6839d235d61b2dd0144b466b2063a53ccece	2021-08-26 18:58:19 -07:00
Maksim Levental	0e8c3c51d9	port glu to use structured kernel approach (#61800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61800 resubmitting because the [last one](https://github.com/pytorch/pytorch/pull/61433) was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29812492 Pulled By: makslevental fbshipit-source-id: c3dfeacd1e00a526e24fbaab02dad48069d690ef	2021-08-26 18:01:28 -07:00
Jane Xu	a5f35ac7cd	Run through failures on trunk (#64063 ) Summary: This PR runs all the tests on trunk instead of stopping on first failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64063 Reviewed By: malfet, seemethere Differential Revision: D30592020 Pulled By: janeyx99 fbshipit-source-id: 318b225cdf918a98f73e752d1cc0227d9227f36c	2021-08-26 17:38:19 -07:00
Paul Johnson	0c9dce90ed	[pytorch] add per_sample_weights support for embedding_bag_4bit_rowwise_offsets (#63605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63605 Reviewed By: houseroad Differential Revision: D30434664 fbshipit-source-id: eb4cbae3c705f9dec5c073a56f0f23daee353bc1	2021-08-26 17:31:45 -07:00
Michael Dagitses	81764d1153	document that `torch.triangular_solve` has optional out= parameter (#63253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63253 Fixes #57955 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30312134 Pulled By: dagitses fbshipit-source-id: 4f484620f5754f4324a99bbac1ff783c64cee6b8	2021-08-26 17:28:17 -07:00
Jiewen Tan	ed573a8e08	Enable test_api IMethodTest in OSS (#63345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. 3. Generated torch::deploy examples when building torch_deploy library. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* Reviewed By: ngimel Differential Revision: D30346257 Pulled By: alanwaketan fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d	2021-08-26 16:50:52 -07:00
Don Jang	0bd8d0951d	[Static Runtime] Remove unnecessary fb::equally_split nodes (#64022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64022 Test Plan: - Added unittest `StaticRuntime.RemoveEquallySplitListUnpack`. Reviewed By: hlu1 Differential Revision: D30472189 fbshipit-source-id: 36040b0146f4be9d0d0fda293f7205f43aad0b87	2021-08-26 16:29:43 -07:00
Shijun Kong	dfa35ab3e7	[pytorch][quant][oss] Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets" (#63658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63658 Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets" Reviewed By: jingsh, supriyar Differential Revision: D30454994 fbshipit-source-id: 7aa7bfe405c2ffff639d5658a35181036e162dc9	2021-08-26 16:09:35 -07:00
soulitzer	92a154aa29	Move variabletype functions around (#63330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63330 - This is in preparation for templated/boxed autograd-not-implemented fallback - Make sure VariableTypeUtils does not depend on generated code - Lift `isFwGradDefined` into `autograd/functions/utils.cpp` so it's available to mobile builds - Removes `using namespace at` from VariableTypeUtils, previously we needed this for Templated version, but now its not strictly necessary but still a good change to avoid name conflicts if this header is included elsewhere in the future. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518573 Pulled By: soulitzer fbshipit-source-id: a0fb904baafc9713de609fffec4b813f6cfcc000	2021-08-26 16:02:39 -07:00
Bo Wang	49353e319c	More sharded_tensor creation ops: harded_tensor.zeros, sharded_tensor.full, sharded_tensor.rand (#63732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63732 Test Plan: $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked --v Imported from OSS Differential Revision: D30472621 D30472621 Reviewed By: pritamdamania87 Pulled By: bowangbj fbshipit-source-id: fd8ebf9b815fdc292ad1aad521f9f4f454163d0e	2021-08-26 16:01:38 -07:00
Jane Xu	49b782b2cb	Add shard number to print_test_stats.py upload name (#64055 ) Summary: Now that the render test results job is gone, each shard on GHA is uploading a JSON test stats report. To ensure differentiation, this PR includes the shard number in the report name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64055 Reviewed By: iramazanli Differential Revision: D30586869 Pulled By: janeyx99 fbshipit-source-id: fd19f347131deec51486bb0795e4e13ac19bc71a	2021-08-26 15:43:29 -07:00
MengeTM	085278f8b1	Derivatives of relu (#63027 ) (#63089 ) Summary: Optimization of relu and leaky_relu derivatives for reduction of VRAM needed for the backward-passes Fixes https://github.com/pytorch/pytorch/issues/63027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63089 Reviewed By: iramazanli Differential Revision: D30582049 Pulled By: albanD fbshipit-source-id: a9481fe8c10cbfe2db485e28ce80cabfef501eb8	2021-08-26 15:33:25 -07:00
Facebook Community Bot	7861dba7f6	Automated submodule update: FBGEMM (#62879 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ce54703857` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62879 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30154801 fbshipit-source-id: b2ce185da6f6cadf5128f82b15097d9e13e9e6a0	2021-08-26 15:20:06 -07:00
Mike Iovine	aeec177833	[JIT] UseVariadicOp takes list_idx parameter (#63915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63915 Previously, this function only worked for variadic op substitutions of the form `op(list, args) -> variadic_op(list_1, ..., list_n, args)`. This change allows for transformations of the form `op(args_0, list, args_1) -> variadic_op(args_0, list_1, ..., list_n, args_1)`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- Stack Concat` (tests exercising `list_idx != 0` will be added further up in this diff stack) Reviewed By: navahgar Differential Revision: D30529729 fbshipit-source-id: 568080679c3b40bdaedee56bef2e8a5ce7985d2f	2021-08-26 14:10:35 -07:00
Can Balioglu	d8d8e4902a	[torch/elastic] Pretty print the failure message captured by @record (#64036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64036 This PR slightly revises the implementation of the internal `_format_failure()` method in order to pretty print the error message captured in a subprocess by the `record` annotation. With this PR a failure log is formatted as below: ``` Root Cause: [0]: time: 2021-08-26_17:12:07 rank: 0 (local_rank: 0) exitcode: 1 (pid: 8045) error_file: /tmp/torchelastic_6cj9eppm/6d9d844a-6ce4-4838-93ed-1639a9525b00_rec9kuv3/attempt_0/0/error.json msg: { "message": "ValueError: Test", "extraInfo": { "py_callstack": [ " File \"/data/home/balioglu/fail.py\", line 7, in <module>\n main()\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n" ], "timestamp": "1629997927" } } ``` in contrast to the old formatting: ``` Root Cause: [0]: time: 2021-08-26_17:15:50 rank: 0 (local_rank: 0) exitcode: 1 (pid: 9417) error_file: /tmp/torchelastic_22pwarnq/19f22638-848c-4b8f-8379-677f34fc44e7_u43o9vs7/attempt_0/0/error.json msg: "{'message': 'ValueError: Test', 'extraInfo': {'py_callstack': 'Traceback (most recent call last):\n File "/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 351, in wrapper\n return f(args, *kwargs)\n File "/data/home/balioglu/fail.py", line 5, in main\n raise ValueError("BALIOGLU")\nValueError: BALIOGLU\n', 'timestamp': '1629998150'}}" ``` ghstack-source-id: 136761768 Test Plan: Run the existing unit tests. Reviewed By: kiukchung Differential Revision: D30579025 fbshipit-source-id: 37df0b7c7ec9b620355766122986c2c77e8495ae	2021-08-26 13:56:46 -07:00
Ilqar Ramazanli	5a12cb611f	To add Chained Scheduler to the list of PyTorch schedulers. (#63491 ) Summary: In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 . The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved. The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied. ### Example The usage is as simple as below: ```python sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9), WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"), StepLR(self.opt, gamma=0.1, step_size=3)]) ``` Then calling ```python sched.step() ``` would trigger step function for all three schedulers consecutively Partially resolves https://github.com/pytorch/vision/issues/4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491 Reviewed By: datumbox, mruberry Differential Revision: D30576180 Pulled By: iramazanli fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4	2021-08-26 13:30:21 -07:00
Shiyan Deng	7cfbc85821	[fx_acc] [fx2trt] add acc op mapper for argmin and converter for topk (#63823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63823 Add mapper for `torch.argmin` which maps it to `acc_ops.flatten` (optional) + `acc_ops.topk` + `acc_ops.getitem` + `acc_ops.squeeze` (optional). This diff doesn't allow mapping if `dim=None && keepdim=True` in `torch.argmin`. Add fx2trt converter for `acc_ops.topk`. Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_argmin buck run mode/opt caffe2/torch/fb/fx2trt:test_topk Reviewed By: jfix71 Differential Revision: D30501771 fbshipit-source-id: 0babc45e69bac5e61ff0b9b4dfb98940398e3e57	2021-08-26 13:16:22 -07:00
Don Jang	cbfec02007	[Static Runtime] Add native op for aten::expand_as (#64024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64024 `aten::expand_as` creates a view of the input tensor. This change adds its native op implementation for the static runtime. Test Plan: - Added `StaticRuntime.IndividualOps_ExpandAs` Reviewed By: hlu1 Differential Revision: D30546851 fbshipit-source-id: e53483048af890bc41b6192a1ab0c5ba0ee2bdc0	2021-08-26 13:05:53 -07:00
Meghan Lele	95d0b3199b	Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 )" (#64004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904 Fixes T98808160 Test Plan: T98808160 Reviewed By: msaroufim Differential Revision: D30527450 fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4	2021-08-26 12:49:42 -07:00
Ansley Ussery	c5cc185b6d	Allow uncompiled strings as input to `checkScriptRaisesRegex` (#63901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63901 cc gmagogsfm Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30579472 Pulled By: ansley fbshipit-source-id: 59ee09c1f25278d4f6e51f626588251bd095c6ea	2021-08-26 12:17:07 -07:00
Luca Wehrstedt	48c57b9b2e	Leverage TensorPipe's automatic SHM address selection (#63028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63028 TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic. Test Plan: CI Reviewed By: mrshenli Differential Revision: D30220352 fbshipit-source-id: 78e8a6ef5916b2a72df26cdc9cd367b9d083e821	2021-08-26 12:15:53 -07:00
Erjia Guan	ad47fb8858	Rename IterableAsDataPipe to IterableWrapper (#63981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63981 Rename `IterableAsDataPipe` to `IterableWrapper` based on our naming convention `Op-er` Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30554197 Pulled By: ejguan fbshipit-source-id: c2eacb20df5645d83ca165d6a1591f7e4791990f	2021-08-26 10:23:25 -07:00
Cheng Chang	0f6b524665	[NNC] Add C++ codegen backend to NNC (#62869 ) Summary: Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR. Tensors are represented as blobs of float. Vector operations are devectorized/unrolled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869 Test Plan: https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC. I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through ``` import torch m = torch.jit.load('mobnet.pt') m.eval() f = torch.jit.freeze(m) torch._C._fancy_compile(f.graph, [1, 3, 224, 224]) ``` The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec. I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded. Reviewed By: ZolotukhinM Differential Revision: D30149482 Pulled By: cheng-chang fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675	2021-08-26 09:56:37 -07:00
Raghavan Raman	6d31ba6ddc	[nnc] Sanitized the names of constants in the input graph. (#63990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63923 The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990 Reviewed By: ZolotukhinM Differential Revision: D30558432 Pulled By: navahgar fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f	2021-08-26 09:52:02 -07:00
Bert Maher	ba5f1b1076	[nnc] Fix dtype promotion involving scalars (#64002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64002 Fixes https://github.com/pytorch/vision/issues/4315 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30566979 Pulled By: bertmaher fbshipit-source-id: eaa98b9534a926be7fcd337d46c5a0acb3243179	2021-08-26 09:43:15 -07:00
Jane Xu	1354ee417a	run_test.py: add option to run only core tests (#63976 ) Summary: This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant. This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py. I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST! Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976 Test Plan: ``` (pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v Selected tests: test_nn, test_ops, test_torch Running test_nn ... [2021-08-25 14:48:28.865078] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123] test_to (__main__.PackedSequenceTest) ... ok test_to_memory_format (__main__.PackedSequenceTest) ... ok ``` Reviewed By: walterddr Differential Revision: D30575560 Pulled By: janeyx99 fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04	2021-08-26 09:29:57 -07:00
Don Jang	fbe7133b58	[Static Runtime] Disable out variant of aten::clone (#63980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980 The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed. Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121 Test Plan: N/A Reviewed By: hlu1 Differential Revision: D30544149 fbshipit-source-id: facb334d67473f622b36862fbdb2633358556fdf	2021-08-26 08:10:13 -07:00
Rong Rong (AI Infra)	7ccc4b5cc8	[CI] move distributed test into its own CI job (#62896 ) Summary: Moving distributed to its own job. - [x] ensure there should be a distributed test job for every default test job matrix (on GHA) - [x] ensure that circleci jobs works for distributed as well - [x] waiting for test distributed to have its own run_test.py launch options, see https://github.com/pytorch/pytorch/issues/63147 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62896 Reviewed By: seemethere Differential Revision: D30230856 Pulled By: walterddr fbshipit-source-id: 0cad620f6cd9e56c727c105458d76539a5ae976f	2021-08-26 08:02:20 -07:00
albanD	733755f72c	remove special grad_mode tls handling (#63116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63116 This PR removes the special flag to disable grad mode tracking on the ThreadLocalState and replaces it with an explicit setter that users can use. This allows to reduce complexity of ThreadLocalState. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388098 Pulled By: albanD fbshipit-source-id: 85641b3d711179fb78ff6a41ed077548dc821a2f	2021-08-26 07:51:30 -07:00
Heitor Schueroff	950f7c0237	Added API tests to ReductionOpInfo and ported amax/amin/nansum tests (#62899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62899 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30408816 Pulled By: heitorschueroff fbshipit-source-id: 6cb0aa7fa7edba93549ef873baa2fb8a003bd91d	2021-08-26 07:18:43 -07:00
Edward Yang	10da1fc3f8	Deify opmath_t into its own header, align with accscalar_t (#63986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63986 Fixes #63985 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30555996 Pulled By: ezyang fbshipit-source-id: b6e4d56a5658ed028ffc105cc4b479faa6882b65	2021-08-26 06:59:46 -07:00
Heitor Schueroff	774ae0851d	[OpInfo] Added ReductionOpInfo subclass of OpInfo and ported sum test (#62737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62737 ReductionOpInfo is a specialization of OpInfo for reduction operators. For now, it is designed to work with reductions that return a single tensor and that reduce all elements along one or more dimensions to a single value. In particular this excludes operators such as `max` and `min` that return multiple tensors and `quantile` that can return multiple values. fixes https://github.com/pytorch/pytorch/issues/49746 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30406568 Pulled By: heitorschueroff fbshipit-source-id: 218b1da1902f67bcf4c3681e2a0f0029a25d51f1	2021-08-26 06:06:38 -07:00
Luca Wehrstedt	c02eda8166	Update TensorPipe submodule Summary: The bot failed to do it. Test Plan: D30542677 Reviewed By: beauby Differential Revision: D30573500 fbshipit-source-id: 50abd6fc415cead0a6b6d9290fa0e5f97d0e4989	2021-08-26 05:44:38 -07:00
Michael Dagitses	61d88cdd1c	use `const auto&` as type for grad alias (#63949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63949 This is an extension of the discussion in https://github.com/pytorch/pytorch/pull/63040#discussion_r687793027. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30546789 Pulled By: dagitses fbshipit-source-id: 3046aff4f129d5492d73dfb67717a824e16ffee8	2021-08-26 04:44:03 -07:00
Kefei Lu	5757d03145	Add logging for _MinimizerBase Summary: Add logging so we know which nodes are currently being visited Test Plan: lint & SC tests Reviewed By: 842974287 Differential Revision: D30509865 fbshipit-source-id: 09e77e44c97c825242e0b24f90463b50f3ca19c6	2021-08-26 00:52:58 -07:00
Rohan Varma	a6f767ed3d	Fix issue re: DDP and create_graph=True (#63831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63831 Closes https://github.com/pytorch/pytorch/issues/63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. ghstack-source-id: 136614644 Test Plan: UT Reviewed By: SciPioneer Differential Revision: D30505573 fbshipit-source-id: 83532b6207b3d80116fcc4dff0e5520d73b3454f	2021-08-25 23:50:25 -07:00
Marjan Fariborz	3b284ab024	Adding BFP16 quantization/dequantization support to OSS (#63059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63059 Supporting BFP16 quantization method to OSS. Currently only support CPU ghstack-source-id: 136639528 Test Plan: Imported from OSS Reviewed By: wanchaol Differential Revision: D30194538 fbshipit-source-id: ac248567ad8028457c2a91b77ef2ce81709fce53	2021-08-25 23:41:34 -07:00
Kiuk Chung	9d95d48567	(torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910 Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such: ``` $ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py ``` An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port. For details see: https://github.com/pytorch/pytorch/issues/63874. This change does a couple of things: 1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic. 1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function. 1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0). 1. Adds a bunch of unittests to cover the different code paths NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue. Test Plan: Unittests. Reviewed By: cbalioglu Differential Revision: D30529984 fbshipit-source-id: 267aea6d4dad73eb14a2680ac921f210ff547cc5	2021-08-25 22:57:43 -07:00
Joseph Spisak	b629ea4620	Update persons_of_interest.rst (#63907 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63907 Reviewed By: jspisak Differential Revision: D30534972 Pulled By: dzhulgakov fbshipit-source-id: ba726fc53e292a362c387cc8b5f7776ca2a2544c	2021-08-25 22:50:54 -07:00
Philip Meier	b1154cc774	enable equal_nan for complex values in isclose (#63571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63571 Test Plan: Imported from OSS Reviewed By: malfet, ngimel Differential Revision: D30560127 Pulled By: mruberry fbshipit-source-id: 8958121ca24e7c139d869607903aebbe87bc0740	2021-08-25 22:05:49 -07:00
nikithamalgi	49c8fbc92f	Clean up related to type refinements (#62444 ) Summary: Creates a helper function to refine the types into a torchScript compatible format in the monkeytype config for profile directed typing Pull Request resolved: https://github.com/pytorch/pytorch/pull/62444 Reviewed By: malfet Differential Revision: D30548159 Pulled By: nikithamalgifb fbshipit-source-id: 7c09ce5f5e043d069313b87112837d7e226ade1f	2021-08-25 21:53:00 -07:00
Zeina Migeed	80a61142e4	inference for algebraic expressions (#63822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63822 Infer algebraic expressions and add it to our symbolic inferencer. Works for conv2D and can be extended to other operations. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30518469 Pulled By: migeed-z fbshipit-source-id: b92dfa40b2d834a535177da42b851701b8f7178c	2021-08-25 20:47:23 -07:00
Zafar Takhirov	124ae597fb	[quant] Fixing the conversion of the quantizable RNN (#63879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63879 Quantizable RNN had a bug, where the `from_observed` was an instance method, instead of a class method. This caused the `tq.convert` to fail. This fixes the issue by making the `from_observed` a classmethod. The tests were passing before because the unittests were not using the custom module path, but a conventional `from_float`, which is also supported. Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` ``` buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm Parsing buck files: finished in 0.5 sec Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 9.2 sec (100%) 12622/12622 jobs, 2/12622 updated Total time: 9.7 sec More details at https://www.internalfb.com/intern/buck/build/0d87b987-649f-4d06-b0e2-97b5077 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: cb99305f-65c9-438b-a99f-a0a2a3089778 Trace available for this run at /tmp/tpx-20210824-115652.540356/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046 ✓ ListingSuccess: caffe2/test:quantization - main (12.550) ✓ Pass: caffe2/test:quantization - test_custom_module_lstm (quantization.core.test_quantized_op.TestQuantizedOps) (174.867) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046 ``` Reviewed By: jerryzh168, mtl67 Differential Revision: D30520473 fbshipit-source-id: bc5d0b5bb079fd146e2614dd42526fc7d4d4f3c6	2021-08-25 20:39:02 -07:00
Zhengxu Chen	2ea2711501	Make frozen symbol name customizable in torch deploy. (#63817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63817 ghstack-source-id: 136699671 Test Plan: eyes Reviewed By: wconstab Differential Revision: D29571559 fbshipit-source-id: 8e3caa4932ef8d7c8559f264f0e9bb5474ad2237	2021-08-25 20:10:35 -07:00
Natalia Gimelshein	f4bc28990f	Compute cuda reduction buffer size in elements (#63969 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/63885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63969 Reviewed By: mruberry Differential Revision: D30549423 Pulled By: ngimel fbshipit-source-id: b16d25030d44ced789c125a333d72b02a8f45067	2021-08-25 18:18:37 -07:00
Jerry Zhang	01b8162d00	Back out "Revert D30384746: [fx2trt] Add a test for quantized resnet18" (#63973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63973 Original commit changeset: b93235323e22 Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test Reviewed By: 842974287 Differential Revision: D30546036 fbshipit-source-id: 2c8302456f072d04da00cf9ad97aa8304bc5e43e	2021-08-25 17:52:22 -07:00
Philip Meier	57d4c6cf42	replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637 Reviewed By: malfet Differential Revision: D30541266 Pulled By: mruberry fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6	2021-08-25 16:47:40 -07:00
David Riazati	1be1c901aa	Remove render_test_results job (#63877 ) Summary: This removes the `render_test_results` job we had before which had been causing some confusion among devs when it failed and isn't really necessary now that we can actually render test results on the PR HUD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63877 Reviewed By: walterddr, janeyx99 Differential Revision: D30546705 Pulled By: driazati fbshipit-source-id: 55fdafdb6f80924d941ffc15ee10787cb54f34a1	2021-08-25 15:55:55 -07:00
John Clow	ba0e6a1e03	[EASY] Update the clang-tidy error message (#63370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63370 As shown by this CI run, the actual thing that is incorrect is the prompt. https://github.com/pytorch/pytorch/actions/runs/1137298261 The CI runs the below command instead of the original command. The original command errors out when importing another file on line 1. Trying to fix the code to work with the original command causes the CI to error out. We should actually ask the user to run `python3 -m tools.linter.install.clang_tidy` Test Plan: Imported from OSS Reviewed By: janeyx99, heitorschueroff Differential Revision: D30530216 Pulled By: Gamrix fbshipit-source-id: 2a2b8d539dcc2839e4000c13e82c207fa89bfc9f	2021-08-25 15:30:13 -07:00
Peter Bell	44ede71751	Shard python_torch_functions.cpp (#62187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187 This file can take 3 minutes on its own to compile, and after python_functions.cpp is the second limiting factor for compile time of `libtorch_python` on a 32-core threadripper. This splits it into 3 files that take around 1 minute each to compile. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D29962048 Pulled By: albanD fbshipit-source-id: 99016d75912bff483fe21b130cef43a6882f8c0e	2021-08-25 15:10:43 -07:00
Jithun Nair	730ce29baf	Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850 ) Summary: CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850 Reviewed By: mruberry Differential Revision: D30547562 Pulled By: malfet fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150	2021-08-25 15:02:03 -07:00
John Clow	b5b9ce146f	Small fixes to the Contributing.txt (#63385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63385 Correcting a mistake for the pytorch uninstall, and adding an extra note for Darwin. Test Plan: Imported from OSS Reviewed By: janeyx99, heitorschueroff Differential Revision: D30530234 fbshipit-source-id: e0f88a1725eeadabfb4b28c1da11e369ee878ab4	2021-08-25 14:50:37 -07:00
Rong Rong (AI Infra)	52ebe7e14e	Back out "Temporary fix for remote gpu execution issue" (#63983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63983 Test for fixes in D30545351. it should resolve the remote execution flag being populated incorrectly issue. Test Plan: CI Reviewed By: malfet, seemethere Differential Revision: D30549443 fbshipit-source-id: b3895909f5cd654ba163b77950872b332fbad3fe	2021-08-25 14:37:01 -07:00
Priya Ramani	5b548f6f64	Shape Propagation Pass: Fix AdaptiveAveragePooling2d (#63629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63629 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30461727 Pulled By: priyaramani fbshipit-source-id: 3873d1d636f79185680b82de06174d8de288c941	2021-08-25 13:13:41 -07:00
driazati	ab5cf5a1eb	Move existing target determinator to tools (#63809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809 This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit. Test Plan: Imported from OSS Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D30497438 Pulled By: driazati fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f	2021-08-25 13:03:53 -07:00
Yi Wang	7edeead796	Add a comment on the potential implicit type up-casting (#63905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63905 as title ghstack-source-id: 136590703 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D30527929 fbshipit-source-id: 69402bbfa87cfd8fc166ce313cde9736ee072589	2021-08-25 12:47:45 -07:00
mingfeima	b0782f0f32	add BFloat16 support for bernoulli and Dropout on CPU (#56372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28836792 Pulled By: VitalyFedyunin fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b	2021-08-25 12:01:27 -07:00
Howard Huang	7299565768	Update torch.distributed.run OMP_NUM_THREADS message to log.warning (#63953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63953 Closes #61138 Test: `python -m torch.distributed.run --nproc_per_node 2 test.py` Still outputs message `LOGLEVEL=ERROR python -m torch.distributed.run --nproc_per_node 2 test.py` Does not output message anymore cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30542997 Pulled By: H-Huang fbshipit-source-id: e7da30dcda51516abf4e56f1f510132e44397027	2021-08-25 11:55:06 -07:00
zhouzhuojie	3d4aabfc48	Fix ciflow/all label generation (#63954 ) Summary: the `ciflow/all` is automatically added but need to be added before we call `gen_root_job_condition`. - fix the order of adding `ciflow/all` - refactor all the string into global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/63954 Reviewed By: malfet Differential Revision: D30545596 Pulled By: zhouzhuojie fbshipit-source-id: 83ab668f0234488afb855a72e3ebd4503f7f1a78	2021-08-25 11:32:32 -07:00
driazati	67d8e7b659	Reformat run_test.py (#63808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808 `black run_test.py` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D30497437 Pulled By: driazati fbshipit-source-id: 41b29b73f41fa4bb15fce5eaa69f8efe614e02f7	2021-08-25 11:27:18 -07:00
Raghavan Raman	64d605bab8	[Static Runtime] Added caching for the NNC code generated for Logit. (#63840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63840 Added NNC generated code for Logit to the cache. ``` Logit NNC Benchmark Time (ns) w/o cache w/ cache logit_nnc_sleef/64 543 536 logit_nnc_sleef/512 3517 3465 logit_nnc_sleef/8192 88483 85881 logit_nnc_sleef/32768 337016 323090 logit_nnc_fast/64 167 163 logit_nnc_fast/512 866 817 logit_nnc_fast/8192 13069 12801 logit_nnc_fast/32768 53429 52530 logit_nnc_vml/64 164 151 logit_nnc_vml/512 783 769 logit_nnc_vml/8192 11563 11674 logit_nnc_vml/32768 46720 46452 ``` Test Plan: Unit tests and inline_cvr model. Reviewed By: hlu1 Differential Revision: D30405424 fbshipit-source-id: 938b1b74758e2612ae151bac890c5f8ebbc42d50	2021-08-25 11:19:58 -07:00
Raghavan Raman	dde07cad6f	[Static Runtime] Added a variable for clamp in the NNC code for Logit. (#63839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839 Replaced the use of a constant for clamp in the NNC code for Logit with a variable. This makes it easier to enable caching for Logit. There is no performance difference with this change, as shown in the micro-benchmarks below. ``` Logit NNC Benchmark Time (ns) const-clamp var-clamp logit_nnc_sleef/64 550 543 logit_nnc_sleef/512 3514 3517 logit_nnc_sleef/8192 85537 82900 logit_nnc_sleef/32768 347635 337016 logit_nnc_fast/64 173 167 logit_nnc_fast/512 829 866 logit_nnc_fast/8192 13286 13069 logit_nnc_fast/32768 51116 53429 logit_nnc_vml/64 146 164 logit_nnc_vml/512 773 783 logit_nnc_vml/8192 11556 11563 logit_nnc_vml/32768 44815 46720 ``` Test Plan: SR unit tests and the inline_cvr model. Reviewed By: bertmaher Differential Revision: D30405466 fbshipit-source-id: adb891fdae5746439931ce5f43165291fec08f52	2021-08-25 11:19:55 -07:00
Raghavan Raman	a2399a76e1	[Static Runtime] Moved NNC operator definitions to separate files. (#63838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838 Refactored NNC operator definitions code into separate files. Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before. Test Plan: Build and ran Static Runtime tests. Reviewed By: hlu1 Differential Revision: D30405467 fbshipit-source-id: 606ef852bb820d5e23a0f8af1bf5dc122e90bceb	2021-08-25 11:18:32 -07:00
Aayush Prakash	8a22d4fa5c	[Reland] Replacing the p.data acccess in utils with tensor.set_ . Passes both test_post_localSGD_optimizer_pari and test_periodic_model_averager tests (#63895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63895 When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future. The replacement is `tensor.set_`. ghstack-source-id: 136593433 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: SciPioneer Differential Revision: D30526178 fbshipit-source-id: a1ac0ec3665d8623edd5bf94f01c1132daff5c00	2021-08-25 11:12:55 -07:00
albanD	ab954cb0d1	clean up engine.cpp thread state (#63115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63115 This actually changes: - callbacks now run with proper grad mode even in worker threads - graphtask's Future callbacks now run with proper TLS when erroring out from a worker thread Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388100 Pulled By: albanD fbshipit-source-id: 7ae9c461c2f0040548dd9e1e314f25e8da0c2e67	2021-08-25 11:08:43 -07:00
Shiyan Deng	c06dfd7c26	[fx2trt] Check input device in TRTModule (#63893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63893 Add a check to ensure all the inputs are on cuda device. Test Plan: CI Reviewed By: kflu, houseroad Differential Revision: D30525265 fbshipit-source-id: 6e50b70fd535defc1f802d51e8bb991b2dd73741	2021-08-25 10:25:34 -07:00
riship	6324d98e9e	bf16 Error message cleanup as well as addition of is_bf16_supported (#63798 ) Summary: ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63798 Reviewed By: heitorschueroff Differential Revision: D30526187 Pulled By: ngimel fbshipit-source-id: c484aec14638097c96c720095d3491249b6b2d14	2021-08-25 09:59:59 -07:00
Karen Zhou	eebac46282	[pruner] add getter for pruned outputs in base pruner (#63520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63520 Rather than having to call `module.parametrizations.weight[0].pruned_outputs` each time we need to access the set of pruned indices, we add a getter `get_module_pruned_outputs` which takes the module as an argument and returns the set. This is used for testing. ghstack-source-id: 136561130 Test Plan: ` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1N4gK Reviewed By: z-a-f Differential Revision: D30374558 fbshipit-source-id: e38dfee0879cadde52b942e899a3d8d7151ee493	2021-08-25 09:57:29 -07:00
Karen Zhou	83b132b112	[pruner] add support for pruning BatchNorm2d (#63519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63519 If the pruner should be pruning biases along with weights, then if the model has BatchNorm2d following pruned Conv2d layers, then the corresponding channels of the BatchNorm must also be pruned. Specifically, they need to zeroed out, rather than fully removed, since in eager mode, the dimensions between layers need to be preserved. To do this, we add a pruning parametrization called `ZeroesParametrization` which zeroes out pruned channels, rather than removing them. The user must provide in the config, a tuple of the Conv2d and BatchNorm layers that go together. The `prepare` method will add the tuple to the `module_groups`; then it will add a PruningParametrization to the Conv2d layer, and a ZeroesParametrization to BatchNorm, and then set their pruned sets to be the same set. That way, during `step`, both masks are updated with the same pruned indices. ghstack-source-id: 136562278 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1N1P6 Reviewed By: z-a-f Differential Revision: D30349855 fbshipit-source-id: 3199d3688d5a70963f9b32d7a8fdac3962ae6a65	2021-08-25 09:56:19 -07:00
Peter Bell	c1dfd58715	Minor OptionalTensorRef updates (#63611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63611 A few minor updates to `OptionalTensorRef`: 1. use `Tensor`'s `unsafe_borrow_t` constructor which avoids an unnecesary `nullptr` check. 2. copy constructor cannot defer to the `const Tensor&` constructor because it checks the tensor is defined, and so would fail for disengaged optionals. 3. use copy-swap idiom to avoid issues with self-assignment. `x = x` should be a no-op, but the old version would clear `x`. 4. Add pointer-like access for consistency with `optional` and `MaybeOwned` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30484704 Pulled By: ezyang fbshipit-source-id: 738f4bd22359eaecd0a519a04e89a4b44d92da5b	2021-08-25 09:37:02 -07:00
Nikita Shulga	5ab356ffe6	Update CMake minimum version to 3.10 (#63660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660 Test Plan: Imported from OSS Reviewed By: janeyx99, mruberry Differential Revision: D30543878 fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0	2021-08-25 09:25:43 -07:00
Rong Rong (AI Infra)	34ed16ffef	Temporary fix for remote gpu execution issue (#63899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63899 See: T99020845 Test Plan: sandcastle Reviewed By: heitorschueroff Differential Revision: D30527384 fbshipit-source-id: ce9933e5e181322c02d4ed17f3fdaabe4c5ba29e	2021-08-25 09:14:03 -07:00
Ansley Ussery	01c35115d8	Fix bug in `check_empty_containers` (#63492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63492 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30402749 Pulled By: ansley fbshipit-source-id: 7de533355fe91ca4f45b2bafc3bfb205a028c1ed	2021-08-25 09:05:08 -07:00
Jane Xu	8c897d254d	Swap CUDA 11.1 and 11.3 in CI to make 11.1 periodic (#63900 ) Summary: Preparing for supporting 11.3 in the next release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63900 Reviewed By: malfet Differential Revision: D30541437 Pulled By: janeyx99 fbshipit-source-id: a7297da7f7818a4291b1c321d62d76fc2c0f1f90	2021-08-25 09:01:26 -07:00
zhouzhuojie	3926fdbaa4	[skip ci] Add generated comment to ruleset json (#63896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63896 Reviewed By: heitorschueroff Differential Revision: D30529820 Pulled By: zhouzhuojie fbshipit-source-id: 7529803af23ea36a7bcb673cd399da80da8e3feb	2021-08-25 08:53:33 -07:00
Alban Desmaison	87a661c79f	Revert D30526034: [pytorch][PR] compute reduction intermediate buffer size in elements Test Plan: revert-hammer Differential Revision: D30526034 (`e69a1398cb`) Original commit changeset: 0aca7f887974 fbshipit-source-id: a22472723818d6fe0c11a6e134080df1ac408038	2021-08-25 07:17:22 -07:00
Linbin Yu	839eaa2e91	Revert D30384746: [fx2trt] Add a test for quantized resnet18 Test Plan: revert-hammer Differential Revision: D30384746 (`10dfa58eba`) Original commit changeset: 1a8638777116 fbshipit-source-id: b93235323e229b391f5456f6e3543988062dd0d4	2021-08-25 00:43:06 -07:00
Jerry Zhang	10dfa58eba	[fx2trt] Add a test for quantized resnet18 (#63446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63446 Add a test for quantized resnet18 running in TensorRT Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test Reviewed By: 842974287 Differential Revision: D30384746 fbshipit-source-id: 1a863877711618cd23d887694269ed9e44ee606c	2021-08-24 21:34:23 -07:00
Jerry Zhang	0301c3bc01	[quant][graphmode][fx] Make maxpool and flatten produce the reference pattern (#63501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63501 Currently some of the ops are considered as working with both float and quantized input, so we may have things like "quant - some_op - dequant" this might not work well with the backend, we may consider change everything to produce "quant - dequant - some_op - quant - dequant" instead in the future, this PR fixes it for maxpool and flatten only to unblock resnet benchmarking on TensorRT Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: mruberry Differential Revision: D30402788 fbshipit-source-id: 892c5ff6552775070e2c1453f65846590fb12735	2021-08-24 21:31:01 -07:00
Mikhail Zolotukhin	d388a1a5df	[TensorExpr] LLVMCodegen: Use addFnAttr instead of addAttribute which was deleted. (#63886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63886 cc gmagogsfm Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30523135 Pulled By: ZolotukhinM fbshipit-source-id: 62e125f917b2a0153eb30879d93cf956587a05e0	2021-08-24 21:23:06 -07:00
Jerry Zhang	c8527bc398	[qunat][graphmode][fx] Add a separate lower_to_native_backend function for relu (#62861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861 This PR adds a lower_to_native_backend function to lower a quantized reference model to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove the fbgemm/qnnpack specific handling in quantization_patterns.py Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30165828 fbshipit-source-id: de1149cd7e7c1840c17c251cd4d35004afd015b7	2021-08-24 21:07:03 -07:00
Natalia Gimelshein	e69a1398cb	compute reduction intermediate buffer size in elements (#63885 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63869 `iter` strides are in bytes, and we are additionally multiplying size computed using those strides by `sizeof(arg_t)`. Computing `output_memory_size` in elements should be enough. This doesn't fix the still real problem of allocating large intermediate tensor, but it makes this tensor smaller by typically a factor of 4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63885 Reviewed By: mruberry Differential Revision: D30526034 Pulled By: ngimel fbshipit-source-id: 0aca7f887974b7776e380463bbd82d32a5786ee8	2021-08-24 19:39:21 -07:00
Thomas J. Fan	ba126df614	TST Adds more modules into common module tests (#62999 ) Summary: This PR moves some modules into `common_modules` to see what it looks like. While migrating some no batch modules into `common_modules`, I noticed that `desc` is not used for the name. This means we can not use `-k` to filter tests. This PR moves the sample generation into `_parametrize_test`, and passes in the already generated `module_input` into users of `modules(modules_db)`. I can see this is a little different from opsinfo and would be happy to revert to the original implementation of `modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62999 Reviewed By: heitorschueroff Differential Revision: D30522737 Pulled By: jbschlosser fbshipit-source-id: 7ed1aeb3753fc97a4ad6f1a3c789727c78e1bc73	2021-08-24 19:16:32 -07:00
Joel Schlosser	544af391b5	Allow arbitrary objects in state_dicts (#62976 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62094 Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module: * `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save * `set_extra_state(self, state)` - Subsumes the given state within the module In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976 Reviewed By: heitorschueroff Differential Revision: D30518657 Pulled By: jbschlosser fbshipit-source-id: 5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386	2021-08-24 19:06:14 -07:00
Thomas J. Fan	58ef99bd5a	TST Adds pickle testing for ModuleInfo (#63736 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds `test_pickle` to `test_modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63736 Reviewed By: heitorschueroff Differential Revision: D30522462 Pulled By: jbschlosser fbshipit-source-id: a03b66ea0d81c6d0845c4fddf0ddc3714bbf0ab1	2021-08-24 19:04:46 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Bert Maher	1787b905c4	Don't switch executors mid test (#63830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63830 It's really not safe to change the executor out from under models that may have already been partially compiled. ghstack-source-id: 136526228 Test Plan: ``` DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install LD_PRELOAD=/lib64/libasan.so.5 numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s ``` Reviewed By: desertfire Differential Revision: D30504489 fbshipit-source-id: 188581cb53f0cf5bd3442d1e9d46e8c0c7e124f8	2021-08-24 18:56:53 -07:00
Bert Maher	543130511a	[nnc] Disable erf and erfc (#63775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63775 These introduce small accuracy differences that cause some internal tests to fail, and it's not worth fixing the tests right now because they're slower than the ATen ops anyways. ghstack-source-id: 136526229 Test Plan: ``` buck test mode/dev //aml/eccv/mcm/training:tests -- --exact 'aml/eccv/mcm/training:tests - test_build_torch_script_model (aml.eccv.mcm.training.tests.publish_helper_tests.TransformerPredictorPublishHelperTests)' ``` Reviewed By: navahgar Differential Revision: D30484557 fbshipit-source-id: 095a9c810539a499105b76e1d96843dbc61b0079	2021-08-24 18:55:45 -07:00
Peter Bell	d454c9e76e	Migrate THCTensor_copyIgnoringOverlaps to ATen (#63505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63505 This isn't a public operator, just a helper function used in CUDA_tensor_apply. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441305 Pulled By: ngimel fbshipit-source-id: 84fabc701cbd8479e02d80f373a3dd62d70df2ce	2021-08-24 18:50:28 -07:00
Jerry Zhang	5b28e3c183	[quant][graphmode][fx] Add reference option support for binary ops (#62698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698 We also removed the special handling in match_utils for binary ops Test Plan: python test/test_quantize.py TestQuantizeFx python test/test_quantize.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30093781 fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f	2021-08-24 18:22:11 -07:00
Hao Lu	6fa646ad54	[StaticRuntime] Fix bug in HasInplaceOp (#63842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63842 Reviewed By: mikeiovine Differential Revision: D30506914 fbshipit-source-id: b2e358cfb991dacdb295b61bbc37beb36b73b852	2021-08-24 17:07:45 -07:00
Harut Movsisyan	956c8fa01e	Microbenchmarking matrix mult (einsum, torch.mult, torch.mm) (#63654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63654 Test Plan: ``` > buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:matrix_mult_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B4_M5_N3_K2_cpu # Input: B: 4, M: 5, N: 3, K: 2, device: cpu Forward Execution Time (us) : 27.970 # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B32_M25_N20_K30_cpu # Input: B: 32, M: 25, N: 20, K: 30, device: cpu Forward Execution Time (us) : 41.830 # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B128_M100_N120_K110_cpu # Input: B: 128, M: 100, N: 120, K: 110, device: cpu Forward Execution Time (us) : 499.114 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B4_M5_N3_K2_cpu # Input: B: 4, M: 5, N: 3, K: 2, device: cpu Forward Execution Time (us) : 6.268 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B32_M25_N20_K30_cpu # Input: B: 32, M: 25, N: 20, K: 30, device: cpu Forward Execution Time (us) : 12.676 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B128_M100_N120_K110_cpu # Input: B: 128, M: 100, N: 120, K: 110, device: cpu Forward Execution Time (us) : 438.219 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B4_M5_N3_cpu # Input: B: 4, M: 5, N: 3, device: cpu Forward Execution Time (us) : 7.657 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B32_M25_N20_cpu # Input: B: 32, M: 25, N: 20, device: cpu Forward Execution Time (us) : 18.523 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B100_M90_N110_cpu # Input: B: 100, M: 90, N: 110, device: cpu Forward Execution Time (us) : 55.103 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B4_M5_N3_cpu # Input: B: 4, M: 5, N: 3, device: cpu Forward Execution Time (us) : 2.501 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B32_M25_N20_cpu # Input: B: 32, M: 25, N: 20, device: cpu Forward Execution Time (us) : 10.589 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B100_M90_N110_cpu # Input: B: 100, M: 90, N: 110, device: cpu Forward Execution Time (us) : 50.102 Reviewed By: ajyu Differential Revision: D30455179 fbshipit-source-id: 9f2d92b2d2b860f41a8e59be2cc086d75b587f7b	2021-08-24 16:26:26 -07:00
Xiaodong Wang	6d58c83007	Turn off layer norm in jit symbolic differentiation (#63816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63816 Test Plan: Confirmed this can rescue the NE: https://www.internalfb.com/mast/job/torchx_xdwang-SparseNNApplication_72cf593d Reviewed By: ngimel Differential Revision: D30498746 fbshipit-source-id: 4a387f32ee2f70685de6104459c7f21bfbddc187	2021-08-24 15:47:13 -07:00
Alban Desmaison	41ffec07ce	Add a common autograd TLS state (#63860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30513253 Pulled By: albanD fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b	2021-08-24 15:34:06 -07:00
Eli Uriegas	865d127a66	.github: Enable with-ssh for Windows (#63440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63440 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30521460 Pulled By: seemethere fbshipit-source-id: e987e170e73fb4f9d9f024bed0e58404ed206848	2021-08-24 14:14:27 -07:00
James Reed	4e37a015c7	[FX] Fix _replicate_for_data_parallel (#63821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63821 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D30502115 Pulled By: jamesr66a fbshipit-source-id: 0f004f95def6e1ba21ccbeab40cb0a739a0ad20c	2021-08-24 13:48:15 -07:00
soulitzer	5be17ec1fc	Do not modify saved variables in-place for spectral norm during power iteration (#62293 ) Summary: Interestingly enough, the original code did have a mechanism that aims to prevent this very issue: but it performs a clone AFTER modifying u and v in-place. This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place. So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293 Reviewed By: bdhirsh Differential Revision: D30489750 Pulled By: soulitzer fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889	2021-08-24 13:08:59 -07:00
Peter Bell	4a0776100e	Migrate legacy lstsq from THC to ATen (CUDA) (#63504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63504 Closes gh-24592 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441304 Pulled By: ngimel fbshipit-source-id: ec176596f54bc084af48a73d1dbb0dcb82fec593	2021-08-24 12:47:16 -07:00
Edward Yang	699c764d2e	Revert D30513613: Removing tensor.data usage in utils with tensor set_ method Test Plan: revert-hammer Differential Revision: D30513613 (`d08a36f831`) Original commit changeset: 402efb9c30fa fbshipit-source-id: 911c66a9852de77dc5274b5fb373258c0c97739a	2021-08-24 12:20:37 -07:00
Bo Wang	835dac0869	Merge common fields from TensorInitParams and ShardedTensorMetadata into TensorProperties (#63731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63731 1) Follow up [PR/63378 last comment](https://github.com/pytorch/pytorch/pull/63378#discussion_r693143053) 2) Also updated the caller side (usage of ShardedTensorMetadta) in fbcode Ref: [landing workflow 3](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Workflow/Landing/#landing-your-prs-from-gi-1) Test Plan: Imported from OSS OSS: (pytorch).. $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v FB: fbcode $ buck test mode/dev //aiplatform/modelstore/checkpointing/pyper/tests:checkpoint_utils_test Reviewed By: wanchaol, heitorschueroff Differential Revision: D30472281 fbshipit-source-id: 727fb0e7f10eab4eb7a10476194e9008f2ac1fb5	2021-08-24 11:49:06 -07:00
Aayush Prakash	d08a36f831	Removing tensor.data usage in utils with tensor set_ method (#63867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63867 When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future. The replacement is `tensor.set_`. ghstack-source-id: 136531233 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager Reviewed By: SciPioneer Differential Revision: D30513613 fbshipit-source-id: 402efb9c30fafc3f285bebc631639f656ceae585	2021-08-24 11:20:44 -07:00
Yi Zhang	73431449b3	update readme and contributing.md (#63843 ) Summary: 1. In fact, Visual Studio isn't supported as CMAKE generator 2. I was asked many times why there's error as 'Could NOT find OpenMP' 3. Add Newly added Best Practices link in contributing.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/63843 Reviewed By: seemethere, heitorschueroff Differential Revision: D30514095 Pulled By: janeyx99 fbshipit-source-id: 76715a1d8c049122546e5a7778cafe54e4dfd5d6	2021-08-24 10:52:11 -07:00
peterjc123	e6dc7bc61b	Subprocess encoding fixes for cpp extension (#63756 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63756 Reviewed By: bdhirsh Differential Revision: D30485046 Pulled By: ezyang fbshipit-source-id: 4f0ac383da4e8843e2a602dceae85f389d7434ee	2021-08-24 10:46:11 -07:00
mingfeima	14d4723abd	add bf16 support for bucketize (#55588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55588 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28836796 Pulled By: VitalyFedyunin fbshipit-source-id: c9ae5b969c30a45473533be5f29bb497f8da5143	2021-08-24 10:31:42 -07:00
Karen Zhou	1256dcd509	[pruner] modify base pruner to prune bias by default (#63202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202 By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned. ghstack-source-id: 136466671 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MV32 modify `fusion_tests` according to API change `buck test mode/opt //scripts/kazhou:fusion_tests` https://pxl.cl/1NbKz Reviewed By: z-a-f Differential Revision: D30294494 fbshipit-source-id: c84655648bee0035559195ca855b98fb7edaa134	2021-08-24 10:25:45 -07:00
Karen Zhou	16ba20507a	[pruner] amend base pruner API to match base sparsifier (#63178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178 Update base pruner API to match base sparsifier API as defined in D28970960 / PR58955 Changes include: - `enable_mask_update = True` in `__init__` - `prepare` takes model and config instead of constructor - convert functionality renamed to `squash_mask`, `convert` method call now raises Error - `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor ghstack-source-id: 136467595 Test Plan: Function names updates according to changes `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MTgH TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API Reviewed By: z-a-f Differential Revision: D30287179 fbshipit-source-id: d4727bea1873b500f2d4bb784db26d532bf26cce	2021-08-24 10:25:43 -07:00
Karen Zhou	5dee15401c	[pruner] refactor `ActivationReconstruction` forward hooks (#63158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158 Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`. ghstack-source-id: 136467465 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MSSv Reviewed By: raghuramank100 Differential Revision: D30282765 fbshipit-source-id: 08a1e4e0650511019fff85cf52b41dd818b0c7f8	2021-08-24 10:24:29 -07:00
Mike Iovine	7774a4e95b	[Static Runtime] Implement prim::VarStack out variant (#63579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579 Provide a static runtime out variant implementation for the new op introduced in D30426232 (`1385f9fb12`). Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack` Reviewed By: navahgar Differential Revision: D30410525 fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8	2021-08-24 09:44:29 -07:00
Xiang Gao	227cb268bc	[Reland] Embedding thrust->cub migration (#63806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63806 Reviewed By: bdhirsh Differential Revision: D30498255 Pulled By: ngimel fbshipit-source-id: 78b7085a92a168cf0163f53dcb712bac922f5235	2021-08-24 09:30:32 -07:00
mingfeima	94d621584a	optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv (#55221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55221 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28836797 Pulled By: VitalyFedyunin fbshipit-source-id: 6b79098c902ffe65d228668118ef36fb49bab800	2021-08-24 08:56:17 -07:00
yanbing-j	33a163d886	Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514 ) Summary: Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514 Reviewed By: ejguan Differential Revision: D30257612 Pulled By: VitalyFedyunin fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f	2021-08-24 08:34:56 -07:00
Thomas J. Fan	2ca2761f3c	ENH Adds no_batch_dim for NLLLoss (#62651 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62651 Reviewed By: VitalyFedyunin Differential Revision: D30303340 Pulled By: jbschlosser fbshipit-source-id: 7ab478cf63bf6cd1f850cad5fd101e74a2cfe3f5	2021-08-24 08:27:27 -07:00
mingfeima	d3be02d100	fix batchnorm2d issue when input is non contiguous (#63392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30476317 Pulled By: VitalyFedyunin fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0	2021-08-24 08:24:01 -07:00
Mike Iovine	1385f9fb12	[JIT] Add variadic stack op (#63578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578 Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation. Most of the implementation/tests are the same as `prim::VarConcat`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt` Reviewed By: navahgar Differential Revision: D30426232 fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce	2021-08-24 08:20:54 -07:00
Rong Rong (AI Infra)	f4aff3a346	[BE] add distributed run_test options (#63147 ) Summary: Currently distributed tests are mixed within test_python. We would like to split the distributed tests into its own batch thus we need to split them out. Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147 Test Plan: - locally run with the addition run_test.py options. - CI Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first. Reviewed By: bdhirsh Differential Revision: D30496178 Pulled By: walterddr fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6	2021-08-24 08:03:01 -07:00
Alban Desmaison	688f06cac3	Revert D30388099: Add a common autograd TLS state Test Plan: revert-hammer Differential Revision: D30388099 (`83d9bad44a`) Original commit changeset: 8e03f940150f fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b	2021-08-24 07:22:39 -07:00
Thomas J. Fan	9914fb6615	ENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62190 Reviewed By: ejguan Differential Revision: D29942385 Pulled By: jbschlosser fbshipit-source-id: 00df6f6f01ad039631bb8679f8de94863aac7650	2021-08-24 06:59:41 -07:00
albanD	83d9bad44a	Add a common autograd TLS state (#63114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114 This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8. This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates. Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS. Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3 tl;dr: give a benefit in most cases. It is never detrimental. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30388099 Pulled By: albanD fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486	2021-08-24 06:54:02 -07:00
Marjan Fariborz	c545b099aa	Separating quantization test from distributed_test (#63058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058 Dedicating separate tests for different quantization methods. Currently supporting FP16 method. ghstack-source-id: 136499767 Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test Reviewed By: wanchaol Differential Revision: D30142580 fbshipit-source-id: 3aacec1a231a662067d2b48c001f0c69fefcdd60	2021-08-24 01:44:55 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	4e15a6f495	[TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63216 Currently there are three classes managed by KernelArena: Expr, Stmt, and Tensor (and derived classes). KernelArena has been a long standing painpoint for NNC devs and we're moving away from that memory management model to ref-count based memory model (using shared_ptr). This commit switches Expr and Stmt to shared_ptr and is the biggest change in this transition. Later commits will detach Tensor from KernelArena and kill the arena + scope altogether. Differential Revision: D30353195 D30353195 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 9575225ada3d0fb65087ae40435f3dfea4792cae	2021-08-24 00:32:11 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
mingfeima	5b7cdc5a3d	add channels last for GroupNorm (#49821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007053 Pulled By: VitalyFedyunin fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31	2021-08-23 22:54:59 -07:00
Jane Xu	f5d585391d	Add ROCm as a platform for which tests can be disabled (#63813 ) Summary: Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655) cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813 Reviewed By: seemethere Differential Revision: D30498478 Pulled By: janeyx99 fbshipit-source-id: f1abe8677e1ddd01de3291e1618272ad8e287dc4	2021-08-23 18:50:04 -07:00
Mike Iovine	d96ef8c1b1	[Static Runtime] SR clones graph input (#63704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63704 Previously SR did not clone the graph. This was leading to subtle bugs in `testStaticRuntime`; static runtime would modify its graph, and the graph used by the JIT interpreter would change as well. The JIT interpreter would then crash if SR-only ops were added! Cloning the graph is more consistent with the behavior of the `Module` ctor. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D30463294 fbshipit-source-id: b771551a1f55f95fde79373b23babcf3e5ddf726	2021-08-23 18:45:41 -07:00
Shiyan Deng	195c60d844	[fx2trt] Add acc op and converter for torch.pow (#63795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63795 att Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_binary_ops Reviewed By: jackm321, wushirong Differential Revision: D30492488 fbshipit-source-id: 6d615770567b13720316f06fd2f866ea2fdc2995	2021-08-23 18:18:31 -07:00
Vitaly Fedyunin	e1bdebf685	Adding DataLoader2 class as future replacement of DataLoader (#63742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63742 Supports sharding and batching on loader level** Supports sharding and batching on loader level Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30494506 Pulled By: VitalyFedyunin fbshipit-source-id: 6648e09d955055ac38e3a4e3973f701acefca762	2021-08-23 18:09:07 -07:00
Rohan Varma	fc07489ec5	[BE] Enable PostLocalSGD tests on windows (#63463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63463 Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, local sgd optimizer can be used on windows. ghstack-source-id: 136437632 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D30358922 fbshipit-source-id: 9b56aebf1075f026637296d338805ad8851c9d40	2021-08-23 17:49:03 -07:00
Rohan Varma	16a4434422	[BE] Enable functional optim tests for windows (#63462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462 Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows. ghstack-source-id: 136437635 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358923 fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73	2021-08-23 17:49:01 -07:00
Shiyan Deng	630ec2e190	[fx_acc] Add mapper for torch.log1p (#63792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63792 Map `torch.log1p` to `acc_ops.add` + `acc_ops.log`. Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_log1p Reviewed By: wushirong Differential Revision: D30491706 fbshipit-source-id: bcbeddf06131113185d2019cfd7cf5e9193a8a78	2021-08-23 17:48:59 -07:00
Peter Bell	e4f44bec27	Fix pocketfft include path in mobile build (#63714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714 PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target, Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498369 Pulled By: malfet fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef	2021-08-23 17:48:57 -07:00
Peter Bell	fc47497905	Simplify ccache instructions in CONTRIBUTING.md (#62549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62549 When building CUDA files with native CMake support, it will respect the `CMAKE_CUDA_COMPILER_LAUNCHER` setting. So, there's no need for symlinks. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498488 Pulled By: malfet fbshipit-source-id: 71c2ae9d4570cfac2a64d777bc95cda3764332a0	2021-08-23 17:47:38 -07:00
driazati	d9231dc3df	Skip archiving useless build artifacts (#63785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63785 We currently zip up everything in `build/` which includes a lot of cruft (`.o` files, random things copied in from dependencies, etc). This makes the artifact bigger (slower upload/download times, and takes about 1.5 minutes to archive). This change makes archiving instead take ~15 seconds and removes the 50 second upload to GitHub step that isn't as useful now that we have the HUD PR page that lists out all artifacts. Test Plan: Imported from OSS Reviewed By: seemethere, janeyx99 Differential Revision: D30494444 Pulled By: driazati fbshipit-source-id: 93202dba7387daeb4859a938110b02ff2dc2ccc4	2021-08-23 17:40:01 -07:00
Bert Maher	172e5c76ab	Fix some memory bugs in onnx passes (#63754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63754 Running onnx tests with ASAN uncovers several memory errors. These two are caused by: (1) iterating the uses list of a node after mutation, and (2) accessing the `blocks` attribute of a possibly deleted node. To reproduce (this is on a CentOS 7 box): ``` DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install LD_PRELOAD=$(realpath /lib64/libasan.so.5) numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s ``` Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30493939 Pulled By: bertmaher fbshipit-source-id: e16e19dc9b4c9896e102ca8bf04c8bedfdde87af	2021-08-23 17:31:45 -07:00
Mike Iovine	fc6dd0bc00	[JIT] Move UseVariadicCat internals (#63577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577 Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder. Also moved some test utilities that other variadic op tests will likely need. Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest` Reviewed By: navahgar Differential Revision: D30409937 fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9	2021-08-23 17:30:36 -07:00
Akshit Khurana	130549d61b	Fix typo in NNAPI tests (#63797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63797 nnapi memory format test has a typo Test Plan: pytest test/test_nnapi.py::TestNNAPI Imported from OSS Reviewed By: Amyh11325 Differential Revision: D30495473 fbshipit-source-id: 8edad7c01a080847a64a2797e077ec4d6077552a	2021-08-23 16:34:24 -07:00
Don Jang	84890aae35	[Static Runtime] Add an out variant op for aten::abs (#63675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675 This change adds an out variant implementation for `aten::abs`. Test Plan: - Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)` - Perf impact: TBD Reviewed By: hlu1 Differential Revision: D30461317 fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191	2021-08-23 16:25:10 -07:00
Rong Rong (AI Infra)	55f8f95ad4	fix git diff issue (#63408 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60111, ideally we should merge this before https://github.com/pytorch/pytorch/issues/63360 but we can also test this with https://github.com/pytorch/pytorch/issues/63360 easily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63408 Test Plan: - This is conform working with local test.sh run by setting PR_NUMBER - should be validated by GHA CI as well Concern: - currently GHA CI is running into proxy 403 rate-limit exceeded issue consistently. However the worst case is not generating any git diff files, which is going to be exactly the same as current behavior. - depends on https://github.com/pytorch/pytorch/issues/63770. Reviewed By: driazati, janeyx99 Differential Revision: D30489355 Pulled By: walterddr fbshipit-source-id: a638b7ae5820f29a7aca6cc40ff390ab253cb174	2021-08-23 15:38:18 -07:00
Eli Uriegas	49be16d50a	.github: Add ec2 information as a step (#63784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63784 Also creates the common.yml.j2 file as a place to store common code amongst the templates Should look like: ![image](https://user-images.githubusercontent.com/1700823/130495226-f18b8c0f-1ea7-4097-8bbb-e998fabb71f2.png) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, driazati Differential Revision: D30490682 Pulled By: seemethere fbshipit-source-id: 18028b4acff938ef54cd6e4877561b2d830a11cf	2021-08-23 15:04:04 -07:00
Erjia Guan	7946f8a9f6	Rename DataPipe to Op-er (#63325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63325 Rename each DataPipe to an operation name ending with er. Functional API should remain `verb` such as `read_from_tar` , `shuffle`, ... (Discussed in [here](https://github.com/facebookexternal/torchdata/pull/97#discussion_r688553905)) - Batch -> Batcher - Collate -> Collator - Concat -> Concater - GroupByKey - > ByKeyGrouper ? - ListDirFiles -> FileLister - LoadFilesFromDisk -> FileLoader - Map -> Mapper - ReadFilesFromTar -> TarArchiveReader - ReadFilesFromZip -> ZipArchiveReader - ReadLinesFromFile -> LineReader - Shuffle -> Shuffler - ToBytes -> StreamReader - Transforms -> Transformer - Zip -> Zipper Let me know if you have better name for each DataPipe Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30466950 Pulled By: ejguan fbshipit-source-id: 72909dca7b3964ab83b965891f96cc1ecf62d049	2021-08-23 14:36:10 -07:00
Zeina Migeed	a781340bf7	Add equality constraints for some acc opeartions for symbolic inference (#63689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63689 Test Plan: buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \ --action=lower_and_run \ --filename=inline_cvr_7x_dec_2020.model \ --print_glow_glog=True Reviewed By: jamesr66a Differential Revision: D30462113 fbshipit-source-id: 0b2a1ce9770561248527d47c07b80112491dc949	2021-08-23 14:11:08 -07:00
Hao Lu	0bc7fef406	[Static Runtime] Remove unused fusion patterns (#63636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63636 Reviewed By: d1jang Differential Revision: D30446573 fbshipit-source-id: 3abb7f697380f3b4e865b98c594de359b5e26b96	2021-08-23 12:55:09 -07:00
Bert Maher	a709ab34a8	[nnc] Re-enable CPU fusion" (#63665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665 This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30471646 Pulled By: bertmaher fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6	2021-08-23 12:42:42 -07:00
Peter Bell	560cd88195	Kill THCUNN (#63429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63429 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441308 Pulled By: ngimel fbshipit-source-id: 3ae342a2f8d5c7f8827b637c4055c5d1b0a1be26	2021-08-23 12:07:16 -07:00
Rong Rong (AI Infra)	db1b27fa8d	fix mpi ssh runtime error (#63580 ) Summary: should fix https://github.com/pytorch/pytorch/issues/60756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63580 Test Plan: - this CI. - validated by running on the bionic_cuda container: https://app.circleci.com/pipelines/github/pytorch/pytorch/366632/workflows/478602fb-698f-4210-ac09-d9c61af5c62b/jobs/15472104 Reviewed By: malfet Differential Revision: D30486472 Pulled By: walterddr fbshipit-source-id: d83ab88d163d4a468f03961a13d891b658668a7f	2021-08-23 09:45:33 -07:00
Rong Rong (AI Infra)	98449f5bba	hotfix clone issue (#63770 ) Summary: This was discovered during https://github.com/pytorch/pytorch/issues/63408. For some reason only this checkout action is not correctly set fetch-depth Pull Request resolved: https://github.com/pytorch/pytorch/pull/63770 Reviewed By: malfet, janeyx99 Differential Revision: D30486110 Pulled By: walterddr fbshipit-source-id: a67395cca2487407ed0d49c8c89587935ca5f212	2021-08-23 09:30:48 -07:00
Gary Miguel	f1d865346f	[ONNX] add test images to repo (#63717 ) Summary: This is better than the status quo: * Test doesn't download files from the internet -> faster and more reliable. * Test doesn't leave the git working directory dirty. Rather than using the original images, I've copied some images from the pytorch/vision repo. This will keep the tests in the two repos in sync, while avoiding adding new assets to the vision repo. See https://github.com/pytorch/vision/pull/4176. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63717 Reviewed By: janeyx99 Differential Revision: D30466016 Pulled By: malfet fbshipit-source-id: 2c56d4c11b5c74db1764576bf1c95ce4ae714574	2021-08-23 07:43:21 -07:00
Alban Desmaison	bafd875f74	Allow implementing either backward or vjp for Function (#63434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63434 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30431968 Pulled By: albanD fbshipit-source-id: 0bb88664283486a9fd3364e6c3d79442a44625c2	2021-08-23 07:07:11 -07:00
Jithun Nair	726fd26b3e	Update ROCm PyTorch persons of interest (#55206 ) Summary: cc jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55206 Reviewed By: VitalyFedyunin Differential Revision: D30296584 Pulled By: dzhulgakov fbshipit-source-id: 6e5c610cc6b7c7fd58b80fa3f9de31f269341a88	2021-08-22 22:31:09 -07:00
Pritam Damania	d6133b2fe6	Remove `_fork_processes` from common_distributed.py (#63711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711 This removes `_fork_process` from common_distributed.py and fixes all other callpoints to use `spawn_process` instead. ghstack-source-id: 136395719 Test Plan: waitforbuildbot Reviewed By: xush6528 Differential Revision: D30463834 fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db	2021-08-22 18:57:12 -07:00
Horace He	2289a12f21	Made FuncTorchBatched decompose CompositeImplicitAutograd (#63616 ) Summary: See https://github.com/facebookresearch/functorch/issues/56 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63616 Reviewed By: zou3519 Differential Revision: D30438316 Pulled By: Chillee fbshipit-source-id: e84446d9f68b87daa0cfff75b3b8a972f36ec85a	2021-08-21 17:14:39 -07:00
jiej	e926f75b0b	BatchNorm autodiff re-enabled (#57321 ) Summary: Turns on BN in autodiff: 1. outputs an empty tensor for running stats to by pass autodiff issue on None; 2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead; Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321 Reviewed By: albanD, ngimel Differential Revision: D30250419 Pulled By: jansel fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa	2021-08-21 09:07:31 -07:00
Bert Maher	37d60c08e5	Revert D30360382: [nnc] Support thread level parallelism in fused kernels Test Plan: revert-hammer Differential Revision: D30360382 (`d6d86efb1c`) Original commit changeset: 29acf4e932c6 fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438	2021-08-21 03:46:43 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Wanchao Liang	8871ff29b7	[sharded_tensor] add readonly tensor properties (#63679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63679 This PR add read only tensor properties to sharded tensor, to match the torch.Tensor behaviors. Test Plan: test_sharded_tensor_metadata Reviewed By: pritamdamania87 Differential Revision: D30459343 fbshipit-source-id: 9aec8ecfe76479eed25f3b843495e5719ed2956d	2021-08-20 22:17:11 -07:00
Hao Lu	b2a601ffe5	[Static Runtime] Implement out variant for fb::quantized_linear (#63635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635 Reviewed By: ajyu Differential Revision: D30446234 fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030	2021-08-20 21:42:22 -07:00
Akshit Khurana	2d58f3f56d	NNAPI: Support const values in binary ops Summary: NNAPI converter failed with 1 const value and one tensor earlier Code suggestions from dreiss Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary Imported from OSS Reviewed By: anshuljain1 Differential Revision: D28893881 fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6	2021-08-20 21:10:26 -07:00
Peter Bell	b4f5809db8	Migrate thnn_conv2d from THC to ATen (#63428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63428 Closes gh-24644, closes gh-24645 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441307 Pulled By: ngimel fbshipit-source-id: 9c3dec469c0525831ae398df261cf41b7df7e373	2021-08-20 18:29:02 -07:00
Bo Wang	3ee1f81dce	Extend _sharded_tensor constructor to support other ops like torch.ones (#63378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63378 a) Introduce InitCommonParams to wrap tensor creation params b) Factor local tensor initiation into common_params so that tensor value is not hard specified in ShardedTensor constructor c) Add _sharded_tensor.ones(...) to exemplify - Note memory_format arg is not provided to be consistent as torch.ones d) Follow up: more ops like torch.full, torch.zero, torch.rand, Test: $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked.test_create_sharded_tensor_with_ones --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorEnumerable.test_create_sharded_tensor_with_ones --v Test Plan: Imported from OSS Reviewed By: pritamdamania87, wanchaol Differential Revision: D30359245 Pulled By: bowangbj fbshipit-source-id: 85768fcb36e9d9d40213036884b1266930a91701	2021-08-20 17:11:34 -07:00
driazati	7c0f5b9aa4	[clang-tidy] Enable more folders (#63380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380 Crosses off some more of #62011, see the test in the stacked PR #63381 Test Plan: Imported from OSS Reviewed By: malfet, seemethere Differential Revision: D30455843 Pulled By: driazati fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755	2021-08-20 16:40:42 -07:00
Yi Zhang	e0fe5699c4	enable increment build for build_libtorch (#63074 ) Summary: Since issue https://github.com/pytorch/pytorch/issues/59859 is resolved. rerun_cmake in build_libtorch should not be hardcoded. build_libtorch is necessary to generate debug version libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63074 Reviewed By: VitalyFedyunin, seemethere Differential Revision: D30306705 Pulled By: malfet fbshipit-source-id: f2077d334191f4973da0681560937bc8bab730c1	2021-08-20 16:30:34 -07:00
北海若	efe01c59e3	[Doc] Deprecation notice for only_inputs argument (#63631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63544. Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631 Reviewed By: ejguan Differential Revision: D30459439 Pulled By: soulitzer fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe	2021-08-20 15:49:49 -07:00
driazati	bcf8e2f57e	Remove breakpad from docker image (#63598 ) Summary: As of https://github.com/pytorch/pytorch/issues/63186 we're doing this properly via a third_party cmake build, so we don't need it here anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63598 Reviewed By: walterddr, malfet Differential Revision: D30432250 Pulled By: driazati fbshipit-source-id: d0d5db14355cf574e42c0d0ed786bb26230180bd	2021-08-20 15:48:39 -07:00
jiayisun	da0820e553	add BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826 ) Summary: Added BFloat16 support for range, sinh, cosh, frexp, and nan_to_num on CPU, and collected the benchmark data of these OPs(range, sinh, cosh, frexp, and nan_to_num) for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Number of cores: 1 core, 28 cores(1 socket) [cosh_sinh_benchmark.txt](https://github.com/pytorch/pytorch/files/6974313/cosh_sinh_benchmark.txt) [frexp_benchmark.txt](https://github.com/pytorch/pytorch/files/6974315/frexp_benchmark.txt) [nan_to_num_benchmark.txt](https://github.com/pytorch/pytorch/files/6974317/nan_to_num_benchmark.txt) [range_benchmark.txt](https://github.com/pytorch/pytorch/files/6974318/range_benchmark.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61826 Reviewed By: saketh-are Differential Revision: D30257259 Pulled By: VitalyFedyunin fbshipit-source-id: 394cd713e6394050a8c90b2160633beb675d71dd	2021-08-20 14:56:52 -07:00
Jeff Daily	a8de0d83fe	empty caching allocator before test_avg_pool2d large subtest (#63528 ) Summary: Otherwise, unrecoverable OOM occurs on MI25. Fixes broken ROCm CI test1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528 Reviewed By: malfet, zhouzhuojie Differential Revision: D30459151 Pulled By: walterddr fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585	2021-08-20 14:01:45 -07:00
Nikita Shulga	b008bb4443	Include iostream in ProcessGroupMPI.cpp (#63656 ) Summary: As it uses `std::cerr`, which in turn results in compilation regression introduced by https://github.com/pytorch/pytorch/pull/61500 Fixes https://github.com/pytorch/pytorch/issues/63653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63656 Reviewed By: ejguan Differential Revision: D30455824 Pulled By: malfet fbshipit-source-id: 29f316e7f7fd8e7dcbee2666e7a985f25bf56515	2021-08-20 13:15:40 -07:00
Scott Wolchok	07e41cf2d7	[easy]Unbreak caffe2benchmarking build (#63655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655 ghstack-source-id: 136324310 Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator Reviewed By: hl475, JacobSzwejbka Differential Revision: D30455659 fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a	2021-08-20 12:57:27 -07:00
BowenBao	1dd648f1c4	[ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596 ) (#62765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62765 Fixes #27723 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375181 Pulled By: msaroufim fbshipit-source-id: 715f4745899757ec405877980cd20c826028eb2c Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-08-20 12:46:56 -07:00
BowenBao	db0771b05d	[ONNX] Update repeat_interleave for dynamic repeats (#59979 ) (#62764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62764 Fixes #58733 - Support dynamic interleave for cases with dynamic repeat values - Moved repeat_interleave symbolic from opset 11 to opset 13, as sequence as output types for loop outputs is needed for this change Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375179 Pulled By: msaroufim fbshipit-source-id: 787f96bf91d124fd0483761088c5f4ae930d96a9 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-08-20 12:46:54 -07:00
BowenBao	8760254911	[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 ) (#62763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763 This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode. When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include: 1. Conv and BatchNorm op fusion. 2. Do constant folding. If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph. In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided. The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted. If no, these optimizations will be ignored, even other requirements are matched. Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes: 1. export_params 2. training 3. do_constant_folding 4. keep_initializers_as_inputs Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375183 Pulled By: msaroufim fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-08-20 12:46:52 -07:00
BowenBao	a65d1ae7cc	[ONNX] Fix controlflow shape inference with contrib op (#60707 ) (#62762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62762 `ONNXShapeTypeInference` for node `n` is skipped if `n` is non ONNX namespace, or if `n` contains any non ONNX namespace nodes. This prevents controlflow nodes containing contrib ops from running `SpecialPostProcess`, which sets up correct node output shape/type information in rare cases. This PR depends on opset 14 export https://github.com/pytorch/pytorch/pull/59486 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375180 Pulled By: msaroufim fbshipit-source-id: 5deacec39f091deb4d75ddd9e660e12fca7f16c5 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-08-20 12:45:53 -07:00
Alban Desmaison	125e2d02e5	Revert D30417370: [nnc] Enable CPU fusion Test Plan: revert-hammer Differential Revision: D30417370 (`b9fc656cf2`) Original commit changeset: 84ce7a578a36 fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b	2021-08-20 12:30:21 -07:00
Pritam Damania	2d671ca41b	[8/N] Remove c10d/ddp fork tests. (#63454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454 Continuation of https://github.com/pytorch/pytorch/pull/63443, this PR removes all fork tests from torch.distributed. ghstack-source-id: 136285511 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30387872 fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513	2021-08-20 12:23:18 -07:00
Alban Desmaison	71da114412	Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader Test Plan: revert-hammer Differential Revision: D30426527 (`5a7133b87f`) Original commit changeset: e5905d3364c4 fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502	2021-08-20 12:06:52 -07:00
Philip Meier	70a3210eca	Add `BinaryUfuncOpInfo` and broadcasting tests (#61964 ) Summary: As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964 Reviewed By: ngimel Differential Revision: D30407734 Pulled By: mruberry fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033	2021-08-20 11:44:15 -07:00
Bert Maher	b9fc656cf2	[nnc] Enable CPU fusion (#63545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417370 Pulled By: bertmaher fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1	2021-08-20 11:18:21 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Bert Maher	d6d86efb1c	[nnc] Support thread level parallelism in fused kernels (#63386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30360382 Pulled By: bertmaher fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6	2021-08-20 11:18:17 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Victor Quach	b95ce1591d	Add docs describing saved tensor hooks (#62362 ) Summary: Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834) Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362 Reviewed By: soulitzer Differential Revision: D30453177 Pulled By: Varal7 fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa	2021-08-20 11:10:51 -07:00
Shiyan Deng	03cc46a0ac	[fx2trt] Add layernorm plugin for dynamic shape (#63620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63620 Added layernorm dynamic plugin, so that it works when explicit batch dim is required. Needed for ig model. Changed the way of how we creating a plugin layer from instantiating the plugin directly to use plugin creator with `PluginFieldCollection`. Follow ups: Another way to convert layernorm is by breaking it down to supported trt layers. T97398182 Test Plan: layernorm unittest Reviewed By: yinghai Differential Revision: D30138205 fbshipit-source-id: aebe021d8de818e20376634f30e84579b9807f9b	2021-08-20 10:52:42 -07:00
Pavithran Ramachandran	5f997a7d2f	[PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62368 # Context The bundled inputs accepts an expression in the form of string InflatableArg.fmt that can be applied on the inputs to inflate. The InflatableArg.fmt provides flexibility to have custom transformation to inflate. When the input arguments to a function are not Tensor type, TorchScript casts the inputs from type T to Optional[T] expects the function to handle Nullable (None) clause as well. This becomes tricky to handle in one line code or lambda functions. We propose an alternative way which allows InflatableArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. This can be provided by InflatableArg.fmt_fn. Please refer to pytorch/test/test_bundled_inputs.py for example on how to use the same. Also refer JacobSzwejbka comment on the same [here](https://github.com/pytorch/pytorch/pull/62368#issuecomment-892012812) # Mitigation Allow InflatedArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. ghstack-source-id: 135158680 Test Plan: To run `test_dict_args` ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource/fbcode] buck test //caffe2/test:test_bundled_inputs -- test_dict_args Action graph will be rebuilt because files have been added or removed. Building: finished in 5.4 sec (100%) 12180/12180 jobs, 0/12180 updated Total time: 5.8 sec More details at https://www.internalfb.com/intern/buck/build/fafcf277-1095-4cba-978d-6022f0d391ad Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 5ef9de71-c1b1-406b-a6c0-3321c2368b8d Trace available for this run at /tmp/tpx-20210727-163946.454212/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934 ✓ ListingSuccess: caffe2/test:test_bundled_inputs - main (11.365) ✓ Pass: caffe2/test:test_bundled_inputs - test_dict_args (test_bundled_inputs.TestBundledInputs) (12.307) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934 ``` To check the py code of TS module: P433043973 Reviewed By: dreiss Differential Revision: D29950421 fbshipit-source-id: c819ec5c94429b7fbf6c4beb0259457f169b08ec	2021-08-20 09:36:08 -07:00
Vitaly Fedyunin	5a7133b87f	Adding DataLoader2 class as future replacement of DataLoader (#63523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523 Supports sharding and batching on loader level** * #63522 Adding IterableAsDataPipe IterDataPipe usefull for tests and simple cases Supports sharding and batching on loader level Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30426527 Pulled By: VitalyFedyunin fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e	2021-08-20 09:01:55 -07:00
albanD	99e28baeba	Small custom function refactor which doesn't change anything (#63433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30431970 Pulled By: albanD fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c	2021-08-20 08:44:23 -07:00
Vitaly Fedyunin	0f2c60f0e3	Adding IterableAsDataPipe IterDataPipe (#63522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63522 Supports sharding and batching on loader level * #63522 Adding IterableAsDataPipe IterDataPipe usefull for tests and simple cases usefull for tests and simple cases Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30426528 Pulled By: VitalyFedyunin fbshipit-source-id: 535b5cc1505bb58731fcca8170541ac5ee7bd417	2021-08-20 08:38:23 -07:00
Mike Iovine	ae901e372e	[Static Runtime] Enable RemoveListMutation (#63536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63536 Enable a pass that transforms sequences like this: ``` li = [] li.append(1) li.append(2) ``` into this: ``` li = [1, 2] ``` Initially I implemented this pass myself (D30387213), but I discovered that there is an existing pass that does the same thing. Reviewed By: hlu1 Differential Revision: D30412970 fbshipit-source-id: 0810ef03480878d5039bd800a40f5fd31c2652ec	2021-08-20 06:15:41 -07:00
Don Jang	913c1f83f4	[Static Runtime] Add native op for aten::detach (#63625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625 This change adds a static runtime's native op implementation for `aten::detach` op. See the standard `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison. Test Plan: - Added `StaticRuntime.IndividualOps_Detach`. - Observed ``` V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1) ``` Reviewed By: hlu1 Differential Revision: D30443187 fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17	2021-08-20 00:46:27 -07:00
Nikita Shulga	bec75daa77	Update protobuf to 3.13.1 (#62571 ) Summary: Update bazel to 4.10.0 Update ASAN_SYMBOLIZER_PATH to llvm-7 Suppress `vptr` ubsan violations in `test_jit` Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack Fixes https://github.com/pytorch/pytorch/issues/62569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571 Reviewed By: walterddr Differential Revision: D30048685 Pulled By: malfet fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd	2021-08-19 23:43:55 -07:00
Raghavan Raman	d82667f7e2	[nnc] Updated sliceTail to do inplace mutation (#63532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412184 Pulled By: navahgar fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6	2021-08-19 22:55:30 -07:00
Raghavan Raman	5e31a3b904	[nnc] Updated sliceHead to do inplace mutation (#63531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412183 Pulled By: navahgar fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50	2021-08-19 22:54:05 -07:00
Scott Wolchok	0a66d5b325	[PyTorch] Remove unnecessary iostream includes in headers (#61500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500 libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead. ghstack-source-id: 136163529 Test Plan: buildsizebot some mobile apps Reviewed By: dhruvbird Differential Revision: D29648016 fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe	2021-08-19 18:54:51 -07:00
Scott Wolchok	b99a299c60	[PyTorch] Remove unused dump() methods in vec headers (#63533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63533 These methods don't seem to be used, and they use std::cout, which incurs a small code size overhead on platforms using libstdc++ due to std::__ioinit (see #61500). Seems like we can just delete them? ghstack-source-id: 136163409 Test Plan: CI Reviwers: #sentinel, dhruvbird Reviewed By: dskhudia Differential Revision: D30412269 fbshipit-source-id: 380b9aa2f9aabc4107188b6b209d2afc1769c0ee	2021-08-19 18:53:49 -07:00
Pavithran Ramachandran	0b6cc8daf2	[PyTorch][Edge] Support backtrace symbolication for Android builds (#63339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63339 # Context https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809 ##### WHAT IS A STACK TRACE? A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program. Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function. We are enabling android stack trace to help debugging on android devices. Test Plan: ## Steps to test ``` buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64 one_world android emulator android-28 adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp cd /data/local/tmp ./lite_predictorAndroid#android-x86_64 ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true ``` ## See how model file is not found stack traces is: ### before ``` ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): (no backtrace available) Aborted ``` ### after ``` 134\|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0 c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e] frame #1 [0x5949427b1eee] frame #2 [0x5949427b1eb2] frame #3 [0x5949427b1cdc] frame #4 std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34] frame #5 c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1] frame #6 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f] frame #7 caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2] frame #8 caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903] frame #9 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd] frame #10 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769] frame #11 benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d] frame #12 main[0x594941882aff] frame #13 __libc_init[0x7b699d08578d] ``` ### what we get for os:linux ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 24 threads Run with 24 threads Loading model... terminate called after throwing an instance of 'c10::Error' what(): open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe] frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6] frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #4: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6) frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) Aborted (core dumped) ```` Reviewed By: dhruvbird Differential Revision: D30135947 fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76	2021-08-19 18:41:29 -07:00
Nikita Shulga	f2bf0f229f	Revert D30359218: [pytorch][PR] [doc] pre-commit fix instructions Test Plan: revert-hammer Differential Revision: D30359218 (`4e1d84ae8f`) Original commit changeset: 61771babeac4 fbshipit-source-id: c2ac0a4a7463fafa03ad0b20bfb0701a8c1476c4	2021-08-19 16:48:04 -07:00
zhouzhuojie	d0d27f6971	Add concurrency group for more workflows (#63606 ) Summary: Fixes unnecessary duplicated workflows runs ![image](https://user-images.githubusercontent.com/658840/130146332-ecf54e49-3538-49c1-88de-b099f1c1e41f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/63606 Reviewed By: malfet, mruberry Differential Revision: D30436889 Pulled By: zhouzhuojie fbshipit-source-id: aafbad1edc45e3ab9bceb00e8f3b4204f18e43d0	2021-08-19 15:39:28 -07:00
Zeina Migeed	71ab48ed3b	acc type inference (#63119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63119 Test Plan: buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \ --action=lower_and_run \ --filename=inline_cvr_7x_dec_2020.model \ --print_glow_glog=True Reviewed By: jamesr66a, jfix71, ansley Differential Revision: D30235895 fbshipit-source-id: dab7f96e1799b99eeae0ee519cf0ddd636fddf2e	2021-08-19 15:23:56 -07:00
Sergei Vorobev	ccca66597a	Replace hardcoded values in IndexKernel.cu (#63372 ) Summary: This is a small change that helps to maintain Cruise pytorch fork, since we use a different hardcoded value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63372 Reviewed By: mruberry Differential Revision: D30396171 Pulled By: ejguan fbshipit-source-id: cc0023f58b5922d3d98c7283495e6dc8d35049b6	2021-08-19 15:02:28 -07:00
Adam J. Stewart	e5ab0d1013	DataLoader: allow non-integer Samplers (#63500 ) Summary: Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file. Fixes https://github.com/pytorch/pytorch/issues/63483 ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500 Reviewed By: mruberry Differential Revision: D30403689 Pulled By: ejguan fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3	2021-08-19 14:55:46 -07:00
Kimish Patel	11a40ad915	[Pytorch] Fix callstack pointer serialization bug (#63576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63576 We serialize function name associated with InlinedCallStackPtr. This is derived via querying Function* stored in InlinedCallStack. However this is a raw pointer that is not gauranteed to be valid when we serialization happens. On the other hand we also store function name separately when constructing InlinedCallStack anyways. So this change just uniformly relies on function_name instead of Function* Test Plan: Internal build's asan failure + CI Reviewed By: larryliu0820 Differential Revision: D30427029 fbshipit-source-id: de9617482404785920ed2e67b72f38461590fba3	2021-08-19 13:35:52 -07:00
Charles David Hernandez	6c3ebccc00	Updating the names of these functions (#63513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513 updating these names per Jerry's nits in the previous pr Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30406710 fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66	2021-08-19 13:34:34 -07:00
Natalia Gimelshein	ce6fe50158	Revert embedding thrust->cub migration (#63451 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63451 Reviewed By: mruberry Differential Revision: D30398482 Pulled By: ngimel fbshipit-source-id: e153786d204215555a6571688eabae712facad7e	2021-08-19 13:03:33 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
Mike Ruberry	efd70b7ce6	Modernizes add and mul documentation (#63309 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39329. The documentation for torch.add and torch.mul was sorely out of date and even included deprecated references. This PR modernizes their descriptions consistent with torch.sub. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63309 Reviewed By: ngimel Differential Revision: D30338004 Pulled By: mruberry fbshipit-source-id: ee1c2a8106af8341253cafb0003b06e8f652624d	2021-08-19 12:49:30 -07:00
kshitij12345	d986d4bf63	[special] use __all__ to hide internal imports (#63135 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63135 Reviewed By: ngimel Differential Revision: D30364287 Pulled By: mruberry fbshipit-source-id: 20078668943fafa45ce09610634b1d2c424b1922	2021-08-19 12:45:43 -07:00
Yusuo Hu	0c3904d180	[BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416 Fix a missing thread local specifier introduced by recent PR https://github.com/pytorch/pytorch/pull/61002 Test Plan: Unit Tests Reviewed By: ngimel Differential Revision: D30376154 fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9	2021-08-19 12:39:27 -07:00
Pritam Damania	535d44141b	[7/N] Remove fork tests for RPC. (#63443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443 After https://github.com/pytorch/pytorch/pull/63442, all distributed tests can run with opt-asan. As a result, we can now remove all of our fork based tests. This is the first PR in a stack, which first removes fork based tests from RPC. ghstack-source-id: 136177744 Test Plan: waitforbuildbot Reviewed By: lw Differential Revision: D30384905 fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889	2021-08-19 11:22:40 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Scott Wolchok	e030b81356	[easy] Fix missing move in TupleType::createNamed (#61572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61572 ghstack-source-id: 136161829 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D29672872 fbshipit-source-id: d8ba2d54f7914dbeb3fc52aa21dd77025951c4b5	2021-08-19 10:38:52 -07:00
Shiyan Deng	3aa4521fe8	[hpc] use fx2trt for exploration track (#63535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63535 Reviewed By: yinghai, jianyuh Differential Revision: D30272810 fbshipit-source-id: 61f3edf2a2282cd8c268a92acf92feb05a6ae3e1	2021-08-19 10:18:56 -07:00
Shiyan Deng	885e312ce0	Add permute021 fx2trt converter (#63238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63238 Reviewed By: yinghai Differential Revision: D30295373 fbshipit-source-id: 2a189fe485edaa978fd03e4b8d8582edb34ec648	2021-08-19 10:17:48 -07:00
Scott Wolchok	e7831fe5de	[PyTorch] Test IValue move/copy/assign/swap more (#54717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54717 Hit more tags in these tests ghstack-source-id: 136140508 Test Plan: buck test //caffe2/aten:ivalue_test Reviewed By: anjali411 Differential Revision: D27339736 fbshipit-source-id: 610c8e92846bb70ba725ab117440326ab50af5ce	2021-08-19 09:50:40 -07:00
David Esiobu	79693bb86a	Use linecache.lazycache to cache generated code. (#63453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63453 Instead of patching linecache.getlines, use linecache.lazycache and parts of the loader protocol described in PEP-302 Test Plan: python3 test/test_fx.py Imported from OSS Reviewed By: suo Differential Revision: D30388176 fbshipit-source-id: 92933711ecf3a21a07e1d6b0d1185ab0efd8341c	2021-08-19 09:17:01 -07:00
anjali411	e1334512a3	Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915 As much as 45% and 20% perf improvement on CUDA and CPU respectively. consistent improvement in perf for all cases -- see perf numbers in comments below Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30404006 Pulled By: anjali411 fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d	2021-08-19 08:42:24 -07:00
Till Hoffmann	f596aa8b77	Poisson zero rate (#61511 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/53485 by allowing zero rates for the Poisson distribution. This implementation is consistent with `scipy.stats.poisson` which admits zero rates. In addition to addressing the aforementioned issue, this PR makes two supporting changes: 1. add a `nonnegative` constraint to enforce non-negative rates for the Poisson distribution. 2. adjust the evaluation of the gradient of `xlogy` such that it is well defined for `x == 0 and y == 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61511 Reviewed By: ejguan Differential Revision: D30352917 Pulled By: albanD fbshipit-source-id: f3d33da58360e80d75eb83519f199b93232a2a2d	2021-08-19 08:30:28 -07:00
Jeff Daily	be9be9bfdd	add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508 ) Summary: Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508 Reviewed By: ejguan Differential Revision: D30406450 Pulled By: walterddr fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a	2021-08-19 07:50:55 -07:00
Ilqar Ramazanli	e7c4988b52	To fix the chainability at epoch zero for some schedulers (#63457 ) Summary: It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed * some of the learning rate schedulers returns initial learning rates at epoch 0 as ``` return self.base_lrs` ``` * This can be a problem when two schedulers called as chained as ``` scheduler1.step() scheduler2.step() ``` in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors. The following code snippet illustrates the problem better ## Reproducing the bug ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 1.0) scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = ExponentialLR(optimizer, gamma=0.9) for epoch in range(10): print(epoch, scheduler2.get_last_lr()[0]) optimizer.step() scheduler1.step() scheduler2.step() ``` ### Current Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 5.904900000000001 6 5.314410000000001 7 4.782969000000001 8 4.304672100000001 9 3.874204890000001 ``` ### Expected Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 0.5904900000000001 6 0.5314410000000001 7 0.4782969000000001 8 0.4304672100000001 9 0.3874204890000001 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457 Reviewed By: datumbox Differential Revision: D30424160 Pulled By: iramazanli fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867	2021-08-19 07:17:03 -07:00
Alban Desmaison	2d5b19f62b	Update full backward hook doc with not-same-object note (#63245 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63245 Reviewed By: ejguan Differential Revision: D30352656 Pulled By: albanD fbshipit-source-id: 7000ecb54a80f2da968ec7600b98574b608578ae	2021-08-19 06:50:56 -07:00
Mike Iovine	47a9e8ff32	[Static Runtime] Support __getitem__ for lists (#63398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398 This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter. Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30368464 fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764	2021-08-19 06:38:51 -07:00
Alban Desmaison	ce61100923	Revert D29399533: Hoisting common expressions out of If blocks Test Plan: revert-hammer Differential Revision: D29399533 (`9477211e7d`) Original commit changeset: 9336b9dc48c0 fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7	2021-08-19 06:20:40 -07:00
Chen Lai	6bb68ba507	Fix interpreter debug logging message (#63499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499 https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message. ghstack-source-id: 136184053 Test Plan: Uncomment and it works Reviewed By: kimishpatel, raziel Differential Revision: D30390699 fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4	2021-08-19 02:14:13 -07:00
Nikolay Korovaiko	5254e3adb8	layernom inplace (#63437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63437 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388824 Pulled By: Krovatkin fbshipit-source-id: 852d19bf238544c5de177ed5854dcd01c7ae5572	2021-08-18 23:07:25 -07:00
Nikolay Korovaiko	531262fe2e	layernorm (#63436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436 use MKLDNN layernorm use mkldnn version 2 address Elias feedback fix build CI errors Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388825 Pulled By: Krovatkin fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908	2021-08-18 23:05:39 -07:00
Mikhail Zolotukhin	6e00b31b15	[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411411 Pulled By: ZolotukhinM fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012	2021-08-18 22:59:31 -07:00
Mikhail Zolotukhin	1d62fb8a63	[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411410 Pulled By: ZolotukhinM fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea	2021-08-18 22:58:25 -07:00
Michael Dagitses	773c8b6440	support optional comparisons with different but comparable types (#62890 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62890 Reviewed By: ejguan Differential Revision: D30396008 Pulled By: dagitses fbshipit-source-id: fca02207509f882973d54484f89c4d116505fc66	2021-08-18 21:40:38 -07:00
Edward Yang	2544664e54	Beef up comment in AccumulateType (#63503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63503 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30403160 Pulled By: ezyang fbshipit-source-id: 6cb24418152d9fb146f86b6f973ec50f1a397a58	2021-08-18 20:59:37 -07:00
Yinbin Ma	0d437fe6d0	BF16 allreduce hook (#63260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260 Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7. Reviewed By: SciPioneer Differential Revision: D30238317 fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb	2021-08-18 20:53:49 -07:00
John Clow	9477211e7d	Hoisting common expressions out of If blocks (#59492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492 Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Also eliminated some dead code in the codebase. Test Plan: python test_jit.py TestIfHoisting Imported from OSS Reviewed By: ngimel Differential Revision: D29399533 fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802	2021-08-18 16:29:30 -07:00
Amy He	d9547b9bb2	Nnapi Delegation: Quick improvements (#63489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489 A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272: 1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime size (nnapi_backend_lib.cpp) 2) weights processing moved from compile to preprocess step, since it can be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp) 3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after `init()`, so they are not needed (nnapi_backend_lib.cpp) Test Plan: Unit tests: `python test/test_jit.py TestNnapiBackend` Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes) Imported from OSS Reviewed By: raziel, iseeyuan Differential Revision: D30398880 fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e	2021-08-18 16:25:01 -07:00
kshitij12345	4dcc2197ce	[fix] tensor_split : non-contiguous indices tensor (#63390 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63390 Reviewed By: ejguan Differential Revision: D30362649 Pulled By: mruberry fbshipit-source-id: 3ea3ad02199e4345beb0b580d056babd56112309	2021-08-18 16:10:17 -07:00
Sangbaek Park	1f4e019d8e	[Vulkan] Fix incorrect input range for Hardshrink tests (#63515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63515 Fixed inappropriate input range for Hardshrink tests: The range -10 ~ +10 for input tensors is more proper when we use the test set of lambda {-4.2, -1.0, -0.42, 0.0, 0.42, 1.0, 4.2, 42.42}. ghstack-source-id: 136141416 Test Plan: ```build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Note that the test can fail sporadically due to the precision loss by FP16(Vulkan)/FP32(CPU). This issue will be handled separately after some design discussions. Reviewed By: SS-JIA Differential Revision: D30389646 fbshipit-source-id: 7224bd8ba4e4972f5fc147df8a0cb84808f8c62e	2021-08-18 15:52:12 -07:00
Rong Rong (AI Infra)	15eec8e1d1	using PR number instead of IN_PULL_REQUEST (#63360 ) Summary: PR numbers should be available on GHA after this. This fixes some target determinator not working issue discovered when manually running: https://github.com/pytorch/pytorch/issues/63412. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63360 Reviewed By: malfet, zhouzhuojie, seemethere Differential Revision: D30374615 Pulled By: walterddr fbshipit-source-id: eee8d8bb7aa4308a6a50cfdcd4423a96d846777f	2021-08-18 15:05:10 -07:00
Mike Iovine	779a3d47b0	[Static Runtime] Benchmark reports native nodes (#63346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346 We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (`078b8004a6`), append D30326461 (`9d9e7a8d72`)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp` Reviewed By: hlu1 Differential Revision: D30346752 fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9	2021-08-18 15:05:08 -07:00
Mostafa Elhoushi	139413078f	[FX] make ASTReriter patch wrapped functions properly (#62987 ) Summary: reference the same global namespace (instead of copying it) in ASTRewriter to patch wrapped functions properly Fixes #{62071} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62987 Test Plan: To test it you may write this snippet and ensure the results are as shown in the comments: ``` import torch import torch.fx torch.fx.wrap def to_be_wrapped(x): return torch.relu(x) class Foo(torch.nn.Module): def forward(self, x): return to_be_wrapped(x) traced = torch.fx.symbolic_trace(Foo()) print(traced.graph) """ graph(): %x : [#users=1] = placeholder[target=x] %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {}) return to_be_wrapped """ from torch.fx.experimental.rewriter import RewritingTracer rt = RewritingTracer() graph = rt.trace(Foo()) print(graph) """ ### AFTER FIX (CORRECT): graph(): %x : [#users=1] = placeholder[target=x] %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {}) return to_be_wrapped ### BEFORE FIX (WRONG): graph(): %x : [#users=1] = placeholder[target=x] %relu : [#users=1] = call_function[target=torch.relu](args = (%x,), kwargs = {}) return relu """ ``` Reviewed By: ansley Differential Revision: D30396176 Pulled By: mostafaelhoushi fbshipit-source-id: f61eddf32e9ef42b5f5c3ce21d559945214ee833	2021-08-18 15:03:57 -07:00
Dhruv Matani	9bbf80969e	[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63464 This was previously committed as D30281388 (`4d6f98ecad`), but was reverted due to t98478641. jnkwok1 confirmed that this change was not the root cause, so trying to land it again. Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons. 1. Increases binary size 2. Slows down model loading 3. Potentially uses more memory at runtime 4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex This change avoids the use of `std::regex` for parsing the device string since we don't need to. ghstack-source-id: 136006963 ghstack-source-id: 136081898 Test Plan: ### AI Bench Runs Before this change: 1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548) 2. Model unload time: 3.5ms After this change: 1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction. 2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user). ### BSB Results ``` D30281388 (`4d6f98ecad`)-V1 (https://www.internalfb.com/intern/diff/D30281388 (`4d6f98ecad`)/?dest_number=135713848) messenger-pika-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -7.1 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/ ``` Reviewed By: raziel, pavithranrao Differential Revision: D30388269 fbshipit-source-id: 10942e7aa56f9ea47aa479a8f50187f2ce2899bf	2021-08-18 14:55:12 -07:00
Mikhail Zolotukhin	7fdba4564a	[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197 This solves non-determinism from using hash values in sort methods. Changes in tests are mostly mechanical. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292776 Pulled By: ZolotukhinM fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055	2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin	8bdd542417	[TensorExpr] Add debug logging to LoopNest::computeInline. (#63196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63196 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292778 Pulled By: ZolotukhinM fbshipit-source-id: d8a111b75466a9354f6d048119cc6f814c9d5abb	2021-08-18 14:48:05 -07:00
Michael Dagitses	feba6806c9	clarify that `torch.finfo.tiny` is the smallest normal number (#63241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63241 This is a common source of confusion, but it matches the NumPy behavior. Fixes #44010 Fixes #59526 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30307646 Pulled By: dagitses fbshipit-source-id: d848140ba267560387d83f3e7acba8c3cdc53d82	2021-08-18 13:44:52 -07:00
Alexander Grund	9253dc1e58	Fix segmentation fault due to access to destroyed CudaIPCGlobalEntities instance (#56141 ) Summary: There is an instance of the static destruction order fiasco where cuda_ipc_global_entities may be accessed after it is destroyed. See https://github.com/pytorch/pytorch/issues/51961 This change uses a flag and avoids accesses to the destroyed class when it is set to false. Fixes https://github.com/pytorch/pytorch/issues/51961 This removes the function to clear shared_blocks introduced by https://github.com/pytorch/pytorch/issues/53080 which had multiple issues: Unprotected access to a shared structure and modification of the vector which is being cleared by the destructors of the objects contained. I.e. what happened was: - `CudaIPCSentDataLimbo_.clear_shared_blocks();` is called from the destructor of CudaIPCGlobalEntities as of your PR - This deletes instances of `CudaIPCSentData` which hold `at::DataPtr` created by `GetNewRefCountedSentData` - This means `CudaIPCSentDataDelete` is called with still active pointers - Hence `CudaIPCSentDataLimbo_.add` is called adding a new value to `shared_blocks_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56141 Reviewed By: ejguan Differential Revision: D30397279 Pulled By: VitalyFedyunin fbshipit-source-id: ce4b8b90fa1c90d275e5eca93ba84321cbc6140a	2021-08-18 13:38:55 -07:00
Charles David Hernandez	877e6f2be3	Bugfix for fuse qconfig comparison (#63384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63384 In some cases the changes to qconfig on module would cause the fusions to fail. This bugfix solves that problem by adding a qconfig_function_comparison that compares the functions within the qconfig rather than the modules the qconfigs are on. The comparison looks at the partial object within QConfig.activation/weight.p and compares args, keywords and func. This is necessary to do mannually because partial doesn't have __eq__ implemented and so == reverts to is. Test Plan: python test/test_quantization.py TestFuseFx.test_problematic_fuse_example Imported from OSS Reviewed By: supriyar, ejguan Differential Revision: D30386264 fbshipit-source-id: 51e358c021c39d6f48dc12ad2a82b2838677b9de	2021-08-18 13:31:56 -07:00
BowenBao	2aa19f33c6	[ONNX] Fix for batchnorm training op mode (#52758 ) (#62760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62760 * Rebase # Conflicts: # torch/csrc/jit/passes/onnx/eval_peephole.cpp # Conflicts: # test/onnx/test_utility_funs.py # torch/onnx/symbolic_opset9.py * Update symbolic_opset12.py * Update test.sh # Conflicts: # .jenkins/caffe2/test.sh * Merge * Fix utility tests # Conflicts: # test/onnx/test_pytorch_onnx_onnxruntime.py # test/onnx/test_utility_funs.py * Fix for comment * Enable BN tests * Fix for test * Update test_pytorch_onnx_onnxruntime.py * Update test_pytorch_onnx_onnxruntime.py * Update test_utility_funs.py * Update test_pytorch_onnx_onnxruntime.py Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349060 Pulled By: msaroufim fbshipit-source-id: 93312c17607974731c17099ae181acb6e4c1c409	2021-08-18 13:29:07 -07:00
BowenBao	e182401062	[ONNX] Remove aten parameter (#61652 ) (#62759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62759 * remove aten argument in export() * add export_to_pretty_string default value OperatorExportTypes.ONNX * add DPYTORCH_ONNX_CAFFE2_BUNDLE description Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349062 Pulled By: msaroufim fbshipit-source-id: d9738f3aa8b80eac54548d0b9494f9f1e544f20f Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-08-18 13:29:04 -07:00
BowenBao	3a7bbf5fb7	[ONNX] Add support for opset14 in PT-ONNX exporter (#59486 ) (#62758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62758 * Add initial changes for opset14 * Fixed flake * Add onnx submodule changes and removed utility func tests * Add updated batchNorm symbolic * Add triu/tril symbolics * Fix lint * Fixed test failures * Add reshape with allowzero * Added tests/refactored opset versioning * Bump onnxruntime version * Fix clang/lint failures * Add reshape shape inference for opset 14 * Changes for allowzero * Fix lint/clang and test failures * Updated PR * Flake fixes * Fix flake * Remove new_jit_api tests * Add opset14 models * Update allowzero * Fix test failures Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349063 Pulled By: msaroufim fbshipit-source-id: 54724246149b01a2f627c43d7396253a7e9c9eb9 Co-authored-by: Shubham Bhokare <sbhokare@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-08-18 13:29:01 -07:00
BowenBao	99b154b8be	[ONNX] Support lstm_cell symbolic (#61476 ) (#62757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62757 Support lstm_cell symbolic Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349061 Pulled By: msaroufim fbshipit-source-id: f236177e3e5c62a30b7e4d91a623bcaef21b5eb1 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-08-18 13:27:46 -07:00
James Reed	d661e646ad	[FX] Fix GraphModule deepcopy to use deepcopied graph (#63090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63090 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D30252471 Pulled By: jamesr66a fbshipit-source-id: cafd7d7917935a5ea6ffa2a7fe9e9b2a9578b3e3	2021-08-18 13:17:14 -07:00
Basil Hosmer	11fbd3958c	MaybeOwned page for dev wiki (#63450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63450 Brief guide to understanding `MaybeOwned<Tensor>`, aimed at C++ PT devs who are obliged to interact with existing uses of it, rather than encouraging new usage. For reviewers: I haven't yet added a link to this page from anywhere. I'm thinking the right place is the [dev wiki main page C++ section](https://github.com/pytorch/pytorch/wiki#c) but happy to put it wherever makes sense, suggestions welcome. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30402313 Pulled By: bhosmer fbshipit-source-id: 69b15909ecafcd8d88e44f664f88c3ad4eb26d84	2021-08-18 12:08:58 -07:00
peterjc123	9bb1371cc2	Disable RDYNAMIC check with MSVC (#62949 ) Summary: When testing with clang-cl, the flag is added though it is unsupported and that generates a few warnings. Tried a few alternatives like https://cmake.org/cmake/help/latest/module/CheckLinkerFlag.html, but they just don't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62949 Reviewed By: zhouzhuojie, driazati Differential Revision: D30359206 Pulled By: malfet fbshipit-source-id: 1bd27ad5772fe6757fa8c3a4bddf904f88d70b7b	2021-08-18 11:51:23 -07:00
Michael Dagitses	d4593d9d08	document why wrappers exist in `torch.functional` (#62847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62844. These wrappers are not super obvious, but ultimately stem from the lack of support for functions with variadic args in native_functions.yaml. https://github.com/pytorch/pytorch/issues/62845 tracks that issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62847 Reviewed By: VitalyFedyunin Differential Revision: D30305016 Pulled By: dagitses fbshipit-source-id: 716fcecb0417b770bc92cfd8c54f7ead89070896	2021-08-18 11:51:21 -07:00
Rohan Varma	f0f5cffde9	[DDP] Add a debug check in cpp fp16 compress (#63379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63379 this codepath has been prone to bugs as seen in the below diff, this will help ensure against changes/refactors that touch this, as a basic sanity check. Enabled it in debug-only builds to not affect the perf. ghstack-source-id: 136056093 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358440 fbshipit-source-id: e1b3893a223722c2593ceed8696a09c7d07d47c1	2021-08-18 11:51:19 -07:00
Rohan Varma	ac1ece054b	[DDP][Grad compression] Fix fp16 cpp hook (#63375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63375 I think tensor.copy_(tensor.to(torch::kFloat16)); will keep it as float32. Tested by add the following line: ``` LOG(INFO) << "Type is: " << compressed_tensor.scalar_type(); ``` before: ``` I0816 17:03:09.823688 364141 default_comm_hooks.cpp:21] Type is: Float ``` after: ``` I0816 17:01:16.779052 353924 default_comm_hooks.cpp:21] Type is: Half ``` ghstack-source-id: 136056092 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D30356256 fbshipit-source-id: 8208a705acd7628541cd43c8bf61d007dfdd2435	2021-08-18 11:49:35 -07:00
Stas Bekman	4e1d84ae8f	[doc] pre-commit fix instructions (#61717 ) Summary: fix invalid instruction Pull Request resolved: https://github.com/pytorch/pytorch/pull/61717 Reviewed By: zhouzhuojie, driazati Differential Revision: D30359218 Pulled By: malfet fbshipit-source-id: 61771babeac4d34425a61ce49f38a7099b521eec	2021-08-18 11:42:25 -07:00
Heitor Schueroff	50a3b6a6a8	Make SkipInfo with expected_failure an XFAIL (#63481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63481 This PR changes the SkipInfo decorators to use unittest.expectedFailure so that the test reports as XFAIL as opposed to PASSED. Note that changing the expectedFailure here `30e1c74dc1/torch/testing/_internal/common_device_type.py (L879)` to an XFAIL is not possible because the decision of whether to decorate is delayed until the wrapper function is called. fixes https://github.com/pytorch/pytorch/issues/63363 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30397154 Pulled By: heitorschueroff fbshipit-source-id: c5e4911969ad8667763eec4203dbbc6a51178592	2021-08-18 11:36:18 -07:00
soulitzer	2f615f6313	Improve custom function docs (#60312 ) Summary: - Adds some code examples for `ctx` methods and make requirements of arguments more clear - Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?) - Refactor `torch.autograd.Function` doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312 Reviewed By: VitalyFedyunin Differential Revision: D30314961 Pulled By: soulitzer fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8	2021-08-18 11:31:31 -07:00
Pritam Damania	d565a7bd68	[6/N] Enable opt-asan for elastic and launcher tests. (#63442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63442 Continuation of https://github.com/pytorch/pytorch/pull/62051, I've enabled elastic and launcher tests to run in opt-asan mode which is supported with spawn multiprocessing. This allows us to completely get rid of fork based tests from torch.distributed and have all tests run in spawn mode. ghstack-source-id: 136057123 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D30384267 fbshipit-source-id: ad3447cfb9d6e31e7ec8332d64c8ff1054858dcb	2021-08-18 10:48:49 -07:00
Shirong Wu	af3cbfed95	Add validation check in fx2trt interpreter (#63424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63424 Add validation check in fx2trt for missing converter operators. If any op missing, interpreter init will report missing operators. Test Plan: for call_function and call_method: manual test with feeds benchmark and verify init failed with expected message. {F642390780} for call_module: specify a module as leaf node and make acc_tracer trace it as a node; then in fx2trt.py, in CONVERTER initialize stage make it skip recording all modules; initialize interpreter and call validator function, verify the output includes the missing module name, return value print as screenshot below. {F643458718} Reviewed By: 842974287 Differential Revision: D30294832 fbshipit-source-id: 243dca3fdfc6a174ded65248938e2a234aec19c6	2021-08-18 10:41:10 -07:00
John Shen	7df2324120	[pytorch] Make qconv forward() thread safe (#63432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63432 There's a race condition in quantized models when multiple threads call forward() due to qnnpack packing the weights the first time the operator is called. This locks the entire apply_impl function. Test Plan: https://github.com/pytorch/pytorch/issues/58055 Ran the script before and after, original crashes went away Reviewed By: kimishpatel Differential Revision: D30229520 fbshipit-source-id: d06cabe24199a80325cd57f24a7fd60624be2cf7	2021-08-18 10:37:13 -07:00
Masaki Kozuki	565578cdab	Use `fastAtomicAdd` in EmbeddingBag (mode "max") backward (#63298 ) Summary: Rel: https://github.com/pytorch/pytorch/issues/62695 ### This PR \| n_tokens \| num_embeddings \| embedding_dim \| mode \| bwd_fp32 \| bwd_fp16 \| \|-----------:\|-----------------:\|----------------:\|:-------\|------------:\|------------:\| \| 4096 \| 4096 \| 4096 \| max \| 0.000326228 \| 0.000181448 \| \| 4096 \| 4096 \| 16384 \| max \| 0.00102805 \| 0.000618136 \| \| 4096 \| 16384 \| 4096 \| max \| 0.000907326 \| 0.000530422 \| \| 4096 \| 16384 \| 16384 \| max \| 0.00334988 \| 0.00264645 \| \| 16384 \| 4096 \| 4096 \| max \| 0.000366449 \| 0.000320232 \| \| 16384 \| 4096 \| 16384 \| max \| 0.00126421 \| 0.00104183 \| \| 16384 \| 16384 \| 4096 \| max \| 0.00087738 \| 0.00065068 \| \| 16384 \| 16384 \| 16384 \| max \| 0.00379229 \| 0.00298201 \| ### Original \| n_tokens \| num_embeddings \| embedding_dim \| mode \| bwd_fp32 \| bwd_fp16 \| \|-----------:\|-----------------:\|----------------:\|:-------\|------------:\|------------:\| \| 4096 \| 4096 \| 4096 \| max \| 0.00032407 \| 0.000188231 \| \| 4096 \| 4096 \| 16384 \| max \| 0.00104356 \| 0.000624001 \| \| 4096 \| 16384 \| 4096 \| max \| 0.000902069 \| 0.000527382 \| \| 4096 \| 16384 \| 16384 \| max \| 0.00302202 \| 0.00255153 \| \| 16384 \| 4096 \| 4096 \| max \| 0.000384343 \| 0.000403249 \| \| 16384 \| 4096 \| 16384 \| max \| 0.00126445 \| 0.00135069 \| \| 16384 \| 16384 \| 4096 \| max \| 0.000880814 \| 0.000825679 \| \| 16384 \| 16384 \| 16384 \| max \| 0.00337611 \| 0.00319515 \| cc xwang233 ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63298 Reviewed By: mruberry Differential Revision: D30383583 Pulled By: ngimel fbshipit-source-id: 14dd9d67002c53a153721812709033c198f68c1e	2021-08-18 10:14:40 -07:00
Rishi Puri	e2ddaec5cf	Reverting launch bounds change in topK that induced a regression in perf (#63431 ) Summary: [topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip) Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops: topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63 topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55 topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55 topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33 The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314 which reduced launch bounds from 1024 to 512. The perf did not seem to regress in the original evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431 Reviewed By: mruberry Differential Revision: D30384087 Pulled By: ngimel fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2	2021-08-18 09:44:07 -07:00
Erjia Guan	383a33a0eb	Make DataChunk support list in-place ops (#63422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63422 Fixes #63095 Make `DataChunk` delegate to list method. Then it will support in-place operations: - `sort` - `reverse` - `append` - `extend` - `random.shuffle` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30379027 Pulled By: ejguan fbshipit-source-id: d176bd0cc8b89b915c7bb184ff243ab1f605616d	2021-08-18 08:48:47 -07:00
cyy	93582e3bba	A tiny fix in MT19937RNGEngine (#63219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63219 Reviewed By: VitalyFedyunin Differential Revision: D30341484 Pulled By: ezyang fbshipit-source-id: 0ff4499d0f4a3dfeb991c0f10fe3248c6ca1c992	2021-08-18 08:05:23 -07:00
Edward Yang	c508433617	Implement subclass priority for __torch_dispatch__ (#63411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63411 In order to get this behavior, you have to use append_overloaded, which I forgot to use in the previous implementation. I exposed an internal helper function which is more appropriate for dispatch to Python where we know that an argument is definitely a Tensor (and this test no longer needs to be done). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30374489 Pulled By: ezyang fbshipit-source-id: 43b08c00d1958c9b26d82a025d19f0b67bb85590	2021-08-18 07:49:03 -07:00
Jerry Zhang	061b36e2f5	[fx2trt] Add dequantize support (#63448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63448 Only available after TensorRT 8.0 Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_dequantize Reviewed By: 842974287 Differential Revision: D30296863 fbshipit-source-id: 44b9630ef0d210e7f20e650dc81c519f7e41f5f3	2021-08-18 07:44:17 -07:00
Philip Meier	a00d587849	add `OpInfo` for `torch.linalg.tensorinv` (#62326 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53739. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62326 Reviewed By: H-Huang Differential Revision: D30136376 Pulled By: zou3519 fbshipit-source-id: 04ec9450e8866667649af401c7559b96ddc91491	2021-08-18 07:37:34 -07:00
JackCaoG	30e1c74dc1	Update cuda amp to also check xla device (#63413 ) Summary: Fixes https://github.com/pytorch/xla/issues/3086. Pytorch/XLA:GPU also use cuda amp. I verified the pt/xla `test_autocast` with this fix and all test passed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63413 Reviewed By: ngimel Differential Revision: D30380785 Pulled By: bdhirsh fbshipit-source-id: fd1a1de7d224c616fc3fa90b80a688a21f6b1ecc	2021-08-18 06:44:10 -07:00
CodemodService FBSourceClangFormatLinterBot	4a390a56c4	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30391472 fbshipit-source-id: d4eb1e7debea8905e7fee5f026c082bee65e78f3	2021-08-18 04:20:05 -07:00
Michael Dagitses	2b303f3f31	enhance comparison tests for c10::optional (#62887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62887 Reviewed By: VitalyFedyunin Differential Revision: D30305044 Pulled By: dagitses fbshipit-source-id: d0a3a9e4ea186915ef087543aaf81a606f943380	2021-08-18 04:08:05 -07:00
Michael Dagitses	0f2f6a79cb	clarify the documentation of `torch.meshgrid` (#62977 ) Summary: Also warn about the behavior differences from `numpy.meshgrid`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62977 Reviewed By: mruberry, ngimel Differential Revision: D30220930 Pulled By: dagitses fbshipit-source-id: ae6587b41792721cae2135376c58121b4634e296	2021-08-18 04:01:22 -07:00
Pritam Damania	f8a84a80cd	[5/N] Run opt-asan with detect_leaks=0 (#63361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361 Python multiprocessing doesn't support LSAN and causes false positives instead. As a result, disabling LSAN for these tests so that we can still run with opt-asan ghstack-source-id: 135962489 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D30352269 fbshipit-source-id: f6ab5abce7bdef00cd5e1f5977424d2b151174af	2021-08-18 01:59:56 -07:00
Wanchao Liang	d431c77d76	[sharded_tensor] fix typing issue for placement (#63426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426 placement should either be a string or a _remote_device, this fixes the type to match the behaviors ghstack-source-id: 136041125 Reviewed By: pritamdamania87 Differential Revision: D30379702 fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b	2021-08-17 23:11:48 -07:00
Pavithran Ramachandran	2fd14735d6	[easy][PyTorchEdge] print error message when failing to load model file (#63404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63404 # Context Loading a model file using `fopen` might error out for multiple reasons. Repro'ing the error on devices takes some time and efforts. Logging the error no# will help in debugging and fixing the error quickly. # Mitigation Printout the error message of the `fopen` to help users debug the issue. Test Plan: ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck run xplat/caffe2/fb/lite_predictor:lite_predictor -- --model=/home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl --use_bundled_input=0 Building: finished in 0.5 sec (100%) 354/354 jobs, 0/354 updated Total time: 0.6 sec Run with 24 threads Run with 24 threads Loading model... terminate called after throwing an instance of 'c10::Error' what(): open file failed because of errno 2 on fopen: No such file or directory, file path: /home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:15 (most recent call first): (no backtrace available) ``` Reviewed By: dhruvbird Differential Revision: D30372308 fbshipit-source-id: 5346e828f53f6bc5d871b403586566a3332a389a	2021-08-17 22:27:49 -07:00
Jerry Zhang	15144ade25	[fx2trt] Add quantize_per_tensor support (#63447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63447 Only available in TRT 8.0 and above Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_quantize_per_tensor Reviewed By: 842974287 Differential Revision: D30322844 fbshipit-source-id: dfd925e3432de128f2925b1aa55d6125e63359af	2021-08-17 21:37:26 -07:00
Shen Li	3fd8e09102	Fix RPC Python User Function Error Handling (#63406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406 The `RemoteException` will be thrown on the caller side when converting the response message to IValue. Since it is a Python error, the error message needs to be extracted explicitly and clear the `PyErr`. Test Plan: Imported from OSS Reviewed By: rohan-varma, ngimel Differential Revision: D30372741 Pulled By: mrshenli fbshipit-source-id: 1f72a7ee0c39cc2ef070f99884c142f7b3e0543d	2021-08-17 20:14:03 -07:00
Aliaksandr Ivanou	f12f667e12	[torch] Set default log level for torch elastic (#63214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63214 The default log level in fb and oss is different: in oss we use WARNING and in fb we use INFO. Test Plan: unittests, f291441502 Reviewed By: cbalioglu Differential Revision: D30296298 fbshipit-source-id: 89067352be767255fbc66e790ec333582de64c6c	2021-08-17 19:58:13 -07:00
Rohan Varma	dcf90b797c	[BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383 Per title ghstack-source-id: 135966157 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358921 fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b	2021-08-17 17:17:25 -07:00
Rohan Varma	5b8862abf1	[DDP] Support step_param for AdamW (#63382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382 Per title ghstack-source-id: 135966156 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30255446 fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75	2021-08-17 17:16:11 -07:00
Jerry Zhang	cd5e9dcc1d	[quant][graphmode][fx][fix] Fix quantization for tuple arguments (#63376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376 Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake, this PR fixes that. Test Plan: python test/test_quantization.py TestQuantizeFx.test_preserve_tuple Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30357642 fbshipit-source-id: 82d10805d9c00c003cc99983dca68b6455ff7b2e	2021-08-17 17:01:24 -07:00
zhouzhuojie	975542c314	Add more ciflow labels for more workflows (#63410 ) Summary: - Add more ciflow labels and enable it for more workflows. - Only the 'ciflow/default' workflows are run by default on pull_request time - Other labels can be manually triggered by (adding the labels + unassign pytorchbot), OR wait for pytorchbot's comment opt-in rollout - The label design is a logical operator `OR`, i.e. adding ('ciflow/cuda' + 'ciflow/win') will trigger the union of them. (design feedback is needed here) Typical default workflows for normal PRs. <details> <summary>Generated label rules</summary> ![image](https://user-images.githubusercontent.com/658840/129779905-eb5e56dd-a696-4040-9eb6-71ecb6487dc1.png) ``` { "label_rules": { "ciflow/all": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ], "ciflow/bazel": [ "linux-xenial-py3.6-gcc7-bazel-test" ], "ciflow/coverage": [ "linux-bionic-py3.8-gcc9-coverage" ], "ciflow/cpu": [ "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "win-vs2019-cpu-py3" ], "ciflow/cuda": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ], "ciflow/default": [ "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3" ], "ciflow/libtorch": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7" ], "ciflow/linux": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7" ], "ciflow/scheduled": [ "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3" ], "ciflow/slow": [ "linux-bionic-cuda10.2-py3.9-gcc7", "linux-xenial-cuda10.2-py3.6-gcc7" ], "ciflow/win": [ "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ] }, "version": "v1" } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63410 Reviewed By: ngimel Differential Revision: D30378553 Pulled By: zhouzhuojie fbshipit-source-id: 4e0953740793e5e72b95018f8ab2ce4a6a364c38	2021-08-17 17:00:09 -07:00
Masaki Kozuki	da87d648b3	`F.avg_pool3` CUDA backward: gpuAtomicAddNoReturn -> fastAtomicAdd (#63387 ) Summary: Rel: https://github.com/pytorch/pytorch/issues/62695 In the following two tables, I set `kernel_size` to 3 and `stride` to 2. In benchmark, input tensors have the shape of (N, C, n_features, n_features, n_features). Tested on RTX3080 w/ CUDA11.4 Update 1. ## This PR \| N \| C \| n_features \| dtype \| time \| \|----:\|----:\|-------------:\|:--------------\|------------:\| \| 32 \| 3 \| 8 \| torch.float16 \| 7.46846e-05 \| \| 32 \| 3 \| 8 \| torch.float32 \| 8.18968e-05 \| \| 32 \| 3 \| 32 \| torch.float16 \| 0.000156748 \| \| 32 \| 3 \| 32 \| torch.float32 \| 0.000165236 \| \| 32 \| 3 \| 128 \| torch.float16 \| 0.00549854 \| \| 32 \| 3 \| 128 \| torch.float32 \| 0.008926 \| ## master (6acd87f) \| N \| C \| n_features \| dtype \| time \| \|----:\|----:\|-------------:\|:--------------\|------------:\| \| 32 \| 3 \| 8 \| torch.float16 \| 7.60436e-05 \| \| 32 \| 3 \| 8 \| torch.float32 \| 7.55072e-05 \| \| 32 \| 3 \| 32 \| torch.float16 \| 0.000189292 \| \| 32 \| 3 \| 32 \| torch.float32 \| 0.000168645 \| \| 32 \| 3 \| 128 \| torch.float16 \| 0.00699538 \| \| 32 \| 3 \| 128 \| torch.float32 \| 0.00890226 \| master's time divided by PR's time is as follows: \| N \| C \| n_features \| master / PR \| \|---:\|---:\|---------------:\|----------------:\| \| 32 \| 3 \| 8 \| 1.018 \| \| 32 \| 3 \| 32 \| 1.208 \| \| 32 \| 3 \| 128 \| 1.272\| cc: xwang233 ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63387 Reviewed By: mruberry Differential Revision: D30381434 Pulled By: ngimel fbshipit-source-id: 3b97aee4b0d457a0277a0d31ac56d4151134c099	2021-08-17 16:53:13 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Rohan Varma	078dcc4e97	[wip] Move smallest bucket to end after rebuild buckets (#62279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279 Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket. Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf. This is currently experimental only and don't plan to land it unless experiments show a clear win. ghstack-source-id: 135966897 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29927931 fbshipit-source-id: 55b949986fa2c3bade6fcb4bf5b513461bf0f490	2021-08-17 15:04:50 -07:00
Kevin Tse	e0e2796fa9	adding a note to the documentation of polar (#63259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63259 Fix #52919 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30342536 Pulled By: NivekT fbshipit-source-id: 4c61a86f96a6370cc64652bf652c4ae25c9f4601	2021-08-17 14:48:32 -07:00
Jerry Zhang	bcddc71f26	[quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608 Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid so that we can produce reference patterns for these ops Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053978 fbshipit-source-id: c527944b6e791bb4d45ebe96265af52794203695	2021-08-17 14:42:40 -07:00
Dhruv Matani	9cd24e12a1	Revert D30281388: [PyTorch] Avoid using std::regex for device string parsing in Device.cpp Test Plan: revert-hammer Differential Revision: D30281388 (`4d6f98ecad`) Original commit changeset: 4d998e9f313e fbshipit-source-id: 11134b3400cc3e851155c9c1b6fb59308ff1567b	2021-08-17 14:40:27 -07:00
Richard Zou	495e7e4815	Fix zero-dim handling in torch.matmul (#63359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359 Fixes #63352. The problem was that in e.g. `torch.matmul(A, B)` with A, B having shapes [3, 2, 0] and [0, 2], the code attempts to call `A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is to manually compute what we want the shape of the view to be. Test Plan: - new tests Reviewed By: ngimel Differential Revision: D30351583 Pulled By: zou3519 fbshipit-source-id: 7625691fe8b85d96a4073409596a932c303e3e8c	2021-08-17 13:44:47 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Kushashwa Ravi Shrimali	a2db5d34a5	OpInfo fix: `conv_transpose2d` (#63389 ) Summary: Addresses comment: https://github.com/pytorch/pytorch/pull/62882#issuecomment-899679606. cc: mruberry ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63389 Reviewed By: mruberry Differential Revision: D30377481 Pulled By: ngimel fbshipit-source-id: 0fa21acc3503c259c9b27463e8555247c43d9e2e	2021-08-17 13:42:36 -07:00
Mike Iovine	9d9e7a8d72	[Static Runtime] Implement aten::append (#63350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350 Add a native implementation for `aten::append`, the list append op. Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append` Reviewed By: hlu1 Differential Revision: D30326461 fbshipit-source-id: 0dbdf6cc82e78c7c36db39583256f6b87385e3d3	2021-08-17 13:40:18 -07:00
Ivan Kobzarev	6621df9a6a	[vulkan] Add log_softmax (#63193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63193 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30291987 fbshipit-source-id: 89c6560274e5a841e5af249f6963b67ef6826f4c	2021-08-17 13:36:02 -07:00
Supriya Rao	b0396e39f4	[quant][fx] Ensure qconfig works for QAT with multiple modules (#63343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343 The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it. This fixes it by creating a copy before modifying it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30346116 fbshipit-source-id: 0e33dad1163e8bff3fd363bfd04de8f7114d7a3a	2021-08-17 11:40:51 -07:00
Yi Wang	e000dfcf97	Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method (#63388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63388 Context: https://discuss.pytorch.org/t/how-to-use-the-helper-function-consume-prefix-in-state-dict-if-present/129505/3 Make it clear that this method strips the prefix in place rather than returns a new value. Additional reformatting is also applied. ghstack-source-id: 135973393 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D30360931 fbshipit-source-id: 1a0c7967a4c86f729e3c810686c21dec43d1dd7a	2021-08-17 11:30:27 -07:00
Elias Ellison	fcc840eae0	Add handling of ifs to shape propagation (#62914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62914 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196945 Pulled By: eellison fbshipit-source-id: 1c0c7f938c4547330fd1dba8ab7dd0b99a79b6a9	2021-08-17 11:26:42 -07:00
Elias Ellison	3975c08e5d	Small shape analysis changes (#62911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62911 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30196946 Pulled By: eellison fbshipit-source-id: 2562bab323088d9c1440ae0431e533f9bcc513d3	2021-08-17 11:26:40 -07:00
Elias Ellison	e2227e86e4	Add a few peepholes (#62910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62910 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196947 Pulled By: eellison fbshipit-source-id: d88c92616d4de4f47ff4fcf5c1994e629ca20395	2021-08-17 11:26:38 -07:00
Elias Ellison	9a60759453	Propagate symbolic dimensions through idioms like x.view(y.size()) (#61975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61975 Propagate symbolic dimensions through size calls. We did this by associating SymbolicSizes with integer inputs by looking through their constructors for `x.size(1)` or `x.size()` nodes. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196948 Pulled By: eellison fbshipit-source-id: 377fc1d2f6d396c52dc0e87fa814b15720f1414e	2021-08-17 11:25:22 -07:00
Jerry Zhang	60cadd0bd1	[fx2trt] Refactor linear op to use mm + add Summary: Previously linear is translated to fully_connected which only works when weight is a constant, this diff changes that to mm + add so that the weight can be an ITensor so that we can have the weight - quantize - dequantize pattern in the produced TensorRT network Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_linear Reviewed By: 842974287 Differential Revision: D30294751 fbshipit-source-id: 596fbd4c81caef8df41a002a2e14fbf22d9d2a80	2021-08-17 10:52:28 -07:00
Mike Ruberry	517aa8965a	Updates set_default_dtype documentation (#63233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60560. The description of set_default_dtype is updated to clarify that it affects the interpretation of Python numbers as either float32 (complex64) or float64 (complex128) and that default (floating) dtypes other than float32 or float64 are unsupported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63233 Reviewed By: VitalyFedyunin Differential Revision: D30306396 Pulled By: mruberry fbshipit-source-id: bbee62f323c773b23b2fa45cb99122bc28197432	2021-08-17 10:41:03 -07:00
Amy He	63554cfb3d	Remove backend_debug from torch_core srcs and replace with library dependency (#63111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63111 ### Problem: Buck contains at least two libraries which have `backend_debug_info.cpp` as a source, `torch_core` and `backend_interface_lib`. `backend_debug_info.cpp` registers BackendDebugInfo as a class. If targets contain both libraries (e.g. sparkAR debug build with NNAPI delegation), then BackendDebugInfo is registered twice, causing a runtime error. ### Solution: These changes remove `backend_debug_info.cpp` and `backend_interface.cpp` as a source in `torch_core` and adds backend_interface_lib as a dependency instead. build_variables.bzl: - Added a list that excludes `backend_debug_info.cpp` and `backend_interface.cpp` ( both srcs already included by `backend_interface_lib`) buck: - torch_core: Removed `backend_debug_info.cpp` from srcs and added `backend_interface_lib` deps - backend_interface_lib: Replaced `torch_mobile_core` dep with more specific deps - to avoid an indirect dep between `torch_core` and `torch_mobile_core` ghstack-source-id: 135981061 Test Plan: ### Test Plan: Build and run SparkAR internally with Android NNAPI Delegation (`buck build --show-output arstudioplayer_arm64_debug`) and internal tests. Reviewed By: iseeyuan Differential Revision: D30259034 fbshipit-source-id: 0c14c827732f07fb9b9bd25a999828b51793cdcc	2021-08-17 10:33:35 -07:00
Amy He	3aecec609f	Move Android Nnapi srcs from aten_native_cpu to aten_cpu (#62919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62919 Move Android NNAPI srcs (nnapi_bind.cpp, nnapi_wrapper.cpp, nnapi_model_loader.cpp) from aten_native_cpu to aten_cpu, so that later the NNAPI delegate's execution library can depend on it. aten_native_cpu is built selectively per app, but the srcs have no selective components and are required for the NNAPI delegate library in D30259033. See Buck Dependencies: https://docs.google.com/document/d/17RuWkqWKCO6sc5fKzIDkGeNhhvMk7BvJOqeSnGsHZ8o/edit?usp=sharing ghstack-source-id: 135981062 Test Plan: `buck build --show-output arstudioplayer_arm64_debug` and internal tests Reviewed By: iseeyuan Differential Revision: D30164867 fbshipit-source-id: 0beff481ff250e75664ce8393beabbeb9db66770	2021-08-17 10:32:30 -07:00
Ivan Kobzarev	c982f13a80	[android][vulkan] Fix model loading for Vulkan backend (#63402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63402 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30370692 Pulled By: IvanKobzarev fbshipit-source-id: 73311b9b767fe9ed3ae390db59d6aa2c4a98f06d	2021-08-17 10:20:32 -07:00
Peter Bell	f70b9ee5de	Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827 ) Summary: This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times. I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827 Reviewed By: astaff Differential Revision: D30342102 Pulled By: malfet fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7	2021-08-17 10:14:40 -07:00
Bradley Davis	011fdc3b7e	[fx] persist `tracer_cls` on `fx.Graph` when deep copying (#63353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63353 Custom deepcopy method copies all nodes but does not copy the tracer_cls attribute Reviewed By: houseroad Differential Revision: D30349424 fbshipit-source-id: 3e98bdac8a8a992eb0b4ec67fe80bb2e5cf3884d	2021-08-17 09:57:48 -07:00
Dhruv Matani	4d6f98ecad	[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63204 Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons. 1. Increases binary size 2. Slows down model loading 3. Potentially uses more memory at runtime 4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex This change avoids the use of `std::regex` for parsing the device string since we don't need to. ghstack-source-id: 136006963 Test Plan: ### AI Bench Runs Before this change: 1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548) 2. Model unload time: 3.5ms After this change: 1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction. 2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user). ### BSB Results ``` D30281388-V1 (https://www.internalfb.com/intern/diff/D30281388/?dest_number=135713848) messenger-pika-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -7.1 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/ ``` Reviewed By: raziel Differential Revision: D30281388 fbshipit-source-id: 4d998e9f313e6366d9d89a6a73cd090ddfb059fc	2021-08-17 09:23:48 -07:00
Dhruv Matani	013a42bdb1	[PyTorch] Add Device_test.cpp (#63203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203 Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work. ghstack-source-id: 136006962 Test Plan: New test. Ran as: ``` cd fbsource/fbcode/ buck test //caffe2/c10:c10_test_0 -- -r '.DeviceTest.' ``` Reviewed By: dreiss, raziel Differential Revision: D30286910 fbshipit-source-id: b5699068dcbba89d5d224dbaf74b175f3f785a00	2021-08-17 09:22:35 -07:00
Taylor Robie	336aa9cd85	change with_callable_args to return a fresh _PartialWrapper (#63374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63326 Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374 Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally. Reviewed By: HDCharles Differential Revision: D30359360 Pulled By: robieta fbshipit-source-id: aef33261ac49952d8d90da868a57ab063dfc456e	2021-08-17 09:11:38 -07:00
Victor Quach	7bad9ac78a	Fix flaky test for dp saved tensor hooks (#63324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324 Fix for https://www.internalfb.com/tasks/?t=98258963 `catch_warnings` seem to only trigger once in certain cases where it should trigger twice. This test is only meant to test whether hooks are trigger / not trigger, so changing it to self.assertGreater is ok. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30340833 Pulled By: Varal7 fbshipit-source-id: 1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c	2021-08-17 08:56:58 -07:00
Erjia Guan	2992d92b5a	Add mode to TarArchiveReader (#63332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63332 Add a corresponding PR from [torchdata](https://github.com/facebookexternal/torchdata/pull/101) Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30350151 Pulled By: ejguan fbshipit-source-id: bced4a1ee1ce89d4e91e678327342e1c095dbb9e	2021-08-17 07:28:37 -07:00
Michael Dagitses	cae5ddc427	add torch.meshgrid() OpInfo (#62720 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62719 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62720 Reviewed By: astaff Differential Revision: D30344574 Pulled By: dagitses fbshipit-source-id: ed42d9fe20741df98018efb08e640fca370583fb	2021-08-17 04:04:24 -07:00
Mike Ruberry	22f78144c7	Extends warning on norm docs (#63310 ) Summary: torch.norm has a couple documentation issues, like https://github.com/pytorch/pytorch/issues/44552 and https://github.com/pytorch/pytorch/issues/38595, but since it's deprecated this PR simply clarifies that the documentation (and implementation) of torch.norm maybe be incorrect. This should be additional encouragement for users to migrate to torch.linalg.vector_norm and torch.linalg.matrix_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63310 Reviewed By: ngimel Differential Revision: D30337997 Pulled By: mruberry fbshipit-source-id: 0fdcc438f36e4ab29e21e0a64709e4f35a2467ba	2021-08-16 22:23:45 -07:00
Peter Bell	ad94248b57	Cleanup dead code (#63328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63328 This code supported the old `at::_fft_with_size` operator which no longer exists. Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30343557 Pulled By: mruberry fbshipit-source-id: 7a71585e013acb46c98f14fd40e15bdfbf026bac	2021-08-16 22:13:08 -07:00
Peter Bell	877b649bc3	Workaround for cuFFT bug (#63327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63327 Fixes #63152 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30343558 Pulled By: mruberry fbshipit-source-id: 68e17a07650f65f397e26efc417e97e2ab302f82	2021-08-16 22:11:52 -07:00
Nikita Shulga	794b04c6c8	Add step to report code coverage from GHA (#63373 ) Summary: Similar to the logic provided in `b2069e7d01/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml (L197-L201)` Fixes https://github.com/pytorch/pytorch/issues/63366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63373 Reviewed By: walterddr Differential Revision: D30357737 Pulled By: malfet fbshipit-source-id: 20b115eb4d6412bd9895680308a9097742d2ae7b	2021-08-16 20:42:38 -07:00
Mikhail Zolotukhin	548c717cbd	[TensorExpr] Remove test_train from tensorexpr tests. (#63194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194 This test implements functionality used nowhere, and the author no longer works on that. This PR also adds test_approx to CMakeLists where it's been missing before. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292777 Pulled By: ZolotukhinM fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554	2021-08-16 20:36:31 -07:00
Don Jang	e7724bb100	[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348 This change addresses singlaiiit's comment on D30241792 (`61b49c8e41`), which makes the JIT interpreter's behavior consistent between `future` is set and not. Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path. Reviewed By: singlaiiit Differential Revision: D30347782 fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8	2021-08-16 17:32:13 -07:00
Don Jang	075024b9a3	[Static Runtime] Fix a bug that assigns multiple outputs to single storage (#63012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63012 This change fixes a bug that the static runtime's memory optimizer assigns multiple outputs of a node to the same storage. Fixing this bug enables the static runtime to run `inline_cvr` with its memory optimizer enabled. A problematic line from `inline_cvr` was as follows: ``` %7767 : Tensor, %getitem_6419.1 : Tensor = fb::gather_ranges(%tensor74.1, %7764) ``` where enabling the memory optimizer assigns `%7767` and `%getitem_6419.1` to the same storage, which made their data corrupted during the 2nd iteration. This change fixed the aforementioned bug by marking all inputs & outputs of a node as `alive` during our liveness analysis. By doing that, no inputs / outputs will collide with each other. I believe this is a fair assumption that most ops' implementation always has, but missing in our analysis before this change. Test Plan: - Added a unittest `StaticRuntime.ValuesShareSameStorageDoesNotContainOutputsFromSameNode` to cover the new code. Reviewed By: hlu1 Differential Revision: D30202018 fbshipit-source-id: 10287a1bee9e86be16a5201e9a7cd7c7f046bab9	2021-08-16 16:52:02 -07:00
Yi Wang	068d6fec5c	[Model Averaging] Add a few member methods of PostLocalSGDOptimizer (#63340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63340 Some methods are needed such as accessing optimizer states. These are necessary for integration with PyTorch Lightning. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 135912246 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: rohan-varma Differential Revision: D30328794 fbshipit-source-id: e585b874313bd266fdc7c79936e2af98700c7bad	2021-08-16 16:39:01 -07:00
Hao Lu	aa63c0d9df	[PyPer] Skip printing out per node time when do_profile is on (#63256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63256 This suppresses printing out the per node time which is very long when the net has too many ops. It can be easily turned on by setting `--pt_sr_print_per_node_time=1`. Reviewed By: ajyu, mikeiovine Differential Revision: D30298331 fbshipit-source-id: 32b3f93b3fe19d335654168311fda93331a1e706	2021-08-16 16:32:19 -07:00
Amy He	b2069e7d01	Refactor NnapiCompilation registration into it's own file (#63183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63183 Move registration of NnapiCompilation into it's own file, so that `nnapi_bind.cpp` (which contains the implementation of NnapiCompilation) can be moved to `aten_cpu`, while maintaining the selectiveness for registration. `nnapi_bind.cpp` is moved to `aten_cpu` in https://github.com/pytorch/pytorch/pull/62919. See the PR for more details on why it's needed. ghstack-source-id: 135900318 Test Plan: Nnapi unit tests: `python test/test_nnapi.py` Reviewed By: iseeyuan Differential Revision: D30288708 fbshipit-source-id: 6ed5967fa6bd018075469d18e68f844d413cf265	2021-08-16 15:45:26 -07:00
Richard Zou	da36bbcd35	Add section to CONTRIBUTING.md explaining developer docs (#63228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63228 It is a quick summary and links to a page on the Developer Wiki that has more detail. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30347109 Pulled By: zou3519 fbshipit-source-id: a6242986d275e5279ca3f61ade2294a132d268c4	2021-08-16 15:44:10 -07:00
Eli Uriegas	4982fc4ecf	test: Add ability to set CONTINUE_THROUGH_ERROR (#63357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357 Adds the ability to set CONTINUE_THROUGH_ERROR as an environment variable so that we can easily set it without having to add the flag directly Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30351108 Pulled By: seemethere fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173	2021-08-16 15:35:40 -07:00
Bo Wang	6acd87fe6a	Add driver function to run test_sharded_tensor.py and test_sharding_spec.py (#63189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63189 Add main --> run_tests func in test file which is needed to launch the real test cases in OSS flow. Test Plan: b/f: $ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==> nothing happened $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v ==> nothing happened after: $ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==> test_chunked_sharding_spec (__main__.TestShardingSpec) ... ok test_device_placement (__main__.TestShardingSpec) ... ok test_enumerable_sharding_spec (__main__.TestShardingSpec) ... ok $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v test_complete_world_size (__main__.TestShardedTensorChunked) ... ok test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... ok test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... [W tensorpipe_agent.cpp:699] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) ok test_invalid_sharding (__main__.TestShardedTensorChunked) ... ok test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... ok test_multiple_local_shards (__main__.TestShardedTensorChunked) ... ok test_new_group (__main__.TestShardedTensorChunked) ... ok test_partial_world_size (__main__.TestShardedTensorChunked) ... ok test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... ok test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... ok test_sharding_columns (__main__.TestShardedTensorChunked) ... ok test_state_dict (__main__.TestShardedTensorChunked) ... ok test_state_dict_new_group (__main__.TestShardedTensorChunked) ... ok test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... ok test_grid_sharding (__main__.TestShardedTensorEnumerable) ... ok test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... ok test_new_group (__main__.TestShardedTensorEnumerable) ... ok test_partial_world_size (__main__.TestShardedTensorEnumerable) ... ok test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... ok test_uneven_shards (__main__.TestShardedTensorEnumerable) ... ok test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... ok test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... ok test_init_from_local_shards_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... ok test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ... Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30294094 fbshipit-source-id: 08f0431a12ea854abe00dc920205b10ba43ae6b6	2021-08-16 15:25:32 -07:00
Shiyan Deng	f4f2c1231a	[fx2trt] add unsqueeze converter (#63355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63355 Added converter for acc_ops.unsqueeze. Needed for ig model. DIdn't add support for input that has more than one dynamic dim. This is not needed right now and I feel it would be a rare case. Test Plan: unit test Reviewed By: yinghai Differential Revision: D30138293 fbshipit-source-id: 899fe8eb68387de83195a2f6e199618d96f09a9e	2021-08-16 15:18:43 -07:00
Mike Iovine	078b8004a6	[Static Runtime] Implement prim::TupleUnpack (#63243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63243 Add `prim::TupleUnpack` native op to static runtime. Test Plan: Unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30306955 fbshipit-source-id: 21923d6cbd5545c144ac051b3d48b37ec6e610cf	2021-08-16 14:56:30 -07:00
Jerry Zhang	a12b371f7c	[fx2trt] Factor out add_matrix_multiply_layer Summary: Factor out the function so that it can be reused in future diffs Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_matmul Reviewed By: 842974287 Differential Revision: D30322823 fbshipit-source-id: 069b945de2c744cdbcca1618b62827692dfb4174	2021-08-16 14:13:37 -07:00
MY_	dc5ce22a1a	A re-open PR: Avoid re-creating the random number generator in RandomSampler (#63026 ) Summary: More details can be found in the old pr: https://github.com/pytorch/pytorch/pull/53085 ejguan Thanks for your guidance. I tried to reopen this PR following your instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63026 Reviewed By: anjali411 Differential Revision: D30224920 Pulled By: ejguan fbshipit-source-id: 2fa83bd4a2661485e553447fe3e57ce723f2716d	2021-08-16 14:08:37 -07:00
Nikita Shulga	3f06f29577	Improve pip package determination (#63321 ) Summary: Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'` Also, add mypy to the list of packages of interest Discovered while looking at https://github.com/pytorch/pytorch/issues/63279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321 Reviewed By: walterddr Differential Revision: D30342099 Pulled By: malfet fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0	2021-08-16 13:54:39 -07:00
Lucas Kabela	4a59f0b9d9	[Profiler] Change FLOP/s to Total FLOPs (#62779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779 Change from floating point operations per second to total floating point operations. This requires removing the division by executing time from the Kineto computed FLOPs and updating necessary documentation Test Plan: Running the following script: ``` import torch from torch.profiler import profile import torchvision.models as models model = models.resnet18().eval() inputs = torch.randn(5, 3, 224, 224) with torch.no_grad(): with profile(record_shapes=True, with_flops=True) as prof: model(inputs) print(prof.key_averages().table(sort_by="cpu_time_total")) ``` Before diff results in: {F636640118} And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =18135839640 FLOP = 18.136 GFLOP`. Running the script again yields this answer: {F636655686} ------------------------------------ Reviewed By: gdankel Differential Revision: D29972997 fbshipit-source-id: 0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b	2021-08-16 13:43:32 -07:00
zhouzhuojie	d2e8359971	Fix triage workflow when the card already exists in project (#63347 ) Summary: Fixes issues like https://github.com/pytorch/pytorch/runs/3336787242 ``` RequestError [HttpError]: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"} Error: Unhandled error: HttpError: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"} at /home/runner/work/_actions/actions/github-script/v2/dist/index.js:7531:23 at processTicksAndRejections (internal/process/task_queues.js:93:5) at async eval (eval at callAsyncFunction (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:7985:56), <anonymous>:63:1) at async main (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:8011:20) { name: 'HttpError', status: 422, ... ``` The card may already exist, thus no need to handle `422` status code. Anything else will re-throw the err. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63347 Reviewed By: malfet Differential Revision: D30348529 Pulled By: zhouzhuojie fbshipit-source-id: 36647837bfccad43ce01eb5dfe6642e685615037	2021-08-16 13:33:58 -07:00
kshitij12345	3ce67efea2	[opinfo] nn.functional.pad (#62814 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814 Reviewed By: VitalyFedyunin Differential Revision: D30307492 Pulled By: zou3519 fbshipit-source-id: 4f6062eb4a3c91ed1795df1f82846afa0abafcdc	2021-08-16 13:29:34 -07:00
Sam Estep	1e8de64c66	Add expecttest to requirements.txt (#63320 ) Summary: This PR closes the developer environment gap left by https://github.com/pytorch/pytorch/issues/60658 by adding [expecttest](https://github.com/ezyang/expecttest) to `requirements.txt`. Thus it provides a solution to one of the short-term problems that https://github.com/pytorch/pytorch/issues/60697 tries to solve, but does not provide a long-term solution to https://github.com/pytorch/pytorch/issues/61375. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63320 Reviewed By: malfet Differential Revision: D30340654 Pulled By: samestep fbshipit-source-id: 26c8f8c9889cce4a94fafb1bf2f0d6df4c70503f	2021-08-16 13:22:43 -07:00
kyshel	e75ed4a4b5	add comma to prevent syntax errors (#62492 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492 Reviewed By: VitalyFedyunin Differential Revision: D30304684 Pulled By: ezyang fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15	2021-08-16 12:27:31 -07:00
Bert Maher	0074a099a8	Retry apt-get during setup_ci_workspace (#63319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63319 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30346067 Pulled By: bertmaher fbshipit-source-id: 2aafa97e78f9297553d772b2524d6f1c0ebaa46e	2021-08-16 12:20:51 -07:00
Nikita Vedeneev	dbcfd7739f	Make `torch.lu` differentiable for wide/tall inputs + jit (#61564 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61564 Reviewed By: astaff Differential Revision: D30338136 Pulled By: mruberry fbshipit-source-id: f01436fc90980544cdfa270feee16bb3dda21b93	2021-08-16 11:40:57 -07:00
Yi Wang	979180cd01	[Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277 `PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 135848575 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: cbalioglu Differential Revision: D30325041 fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd	2021-08-16 10:07:41 -07:00
Meghan Lele	d5d5f42ea9	Revert "[docs] Update docs for NegativeBinomial (#45693 )" (#63192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63192 Summary This reverts commit 402caaeba513929dcfe12df183c764b0ef43f688. As per the dicussion in #62178, this commit was not needed. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293202 Pulled By: SplitInfinity fbshipit-source-id: 91ee7ad0523a9880605d83fe9712c39df67384a8	2021-08-16 09:14:44 -07:00
Erjia Guan	d1cbee7b2b	Refactor BucketBatch (#63185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63185 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30288893 Pulled By: ejguan fbshipit-source-id: b88b792d12a83c99d8ea9e516e3b4c54a82100f6	2021-08-16 06:42:56 -07:00
Erjia Guan	56d609d93e	Replace str by repr for DataChunk (#63184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63184 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30288892 Pulled By: ejguan fbshipit-source-id: 45c88fdd3987e234f2c22ebbbfd8d5044983c34c	2021-08-16 06:41:38 -07:00
Raghavan Raman	e50e8b07d8	[nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30309636 Pulled By: navahgar fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4	2021-08-16 00:09:22 -07:00
Supriya Rao	a421cba325	[docs][ao] Add overload information for fake_quantize_per_tensor_affine (#63258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63258 This function supports scalar and tensor qparams Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30316432 fbshipit-source-id: 8b2f5582e7e095fdda22c17d178abcbc89a2d1fc	2021-08-15 22:47:05 -07:00
Supriya Rao	0831b59cf5	[docs][ao] Add missing docstrings for quantized_max_pool1d and quantized_max_pool2d (#63242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63242 These functions are part of the native functions namespace as well as the quantized namespace Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30316430 fbshipit-source-id: cd9c839e5c1a961e3c6944e514c16fbc256a2f0c	2021-08-15 22:47:03 -07:00
Supriya Rao	a090073fe4	[docs][ao] Add missing documentation for torch.quantized_batch_norm (#63240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63240 Op is exposed via torch.quantized_batch_norm to the end user without any existing documentation Test Plan: CI Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30316431 fbshipit-source-id: bf2dc8b7b6f497cf73528eaa2bedef9f65029d84	2021-08-15 22:45:56 -07:00
Heitor Schueroff	50fc8e8250	[OpInfo] Add expected_failure kwarg to SkipInfo (#62963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62963 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30327199 Pulled By: heitorschueroff fbshipit-source-id: 45231eca11d1697a4449d79849fb17264d128a6b	2021-08-15 18:09:20 -07:00
Heitor Schueroff	8987726cc6	Small refactor for OpInfo decorators (#62713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62713 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30327200 Pulled By: heitorschueroff fbshipit-source-id: 1899293990c8c0a66da88646714b38f1aae9179d	2021-08-15 18:08:12 -07:00
Kimish Patel	3ca3349555	[Pytorch Edge] Fix broken test post changes in error reporting format. (#63287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63287 Recent changes in https://github.com/pytorch/pytorch/pull/62419 changed the way module hierarchy is reported. Now it includes information about function names as well. Test Plan: python test/mobile/test_lite_script_module.py TestLiteScriptModule.test_save_mobile_module_with_debug_info_with_trace Imported from OSS Reviewed By: iseeyuan Differential Revision: D30328512 fbshipit-source-id: ddd6b11b9ab01cc725f4568a35eff7a92f17204b	2021-08-15 16:14:11 -07:00
Ilqar Ramazanli	cec08e7032	To add warm-up scheduler to optim (#60836 ) Summary: Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf . In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers: * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243 * Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf There are two type of popularly used learning rate warm up ideas * Constant warmup (start with very small constant learning rate) * Linear Warmup ( start with small learning rate and gradually increase) In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler. ## Linear Warmup Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to: warmup_factor + (1-warmup_factor) * i / warmup_iters Moreover, the fraction of this quantity at point i to point i-1 will give us 1 + (1.0 - warmup_factor) / [warmup_iterswarmup_factor+(i-1)(1-warmup_factor)] which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works. ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear") for epoch in range(15): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.019000000000000003 2 0.028000000000000008 3 0.03700000000000001 4 0.04600000000000001 5 0.055000000000000014 6 0.06400000000000002 7 0.07300000000000002 8 0.08200000000000003 9 0.09100000000000004 10 0.10000000000000005 11 0.10000000000000005 12 0.10000000000000005 13 0.10000000000000005 14 0.10000000000000005 ``` ## Constant Warmup Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") for epoch in range(10): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.010000000000000002 2 0.010000000000000002 3 0.010000000000000002 4 0.010000000000000002 5 0.10000000000000002 6 0.10000000000000002 7 0.10000000000000002 8 0.10000000000000002 9 0.10000000000000002 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836 Reviewed By: saketh-are Differential Revision: D29537615 Pulled By: iramazanli fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69	2021-08-15 12:31:45 -07:00
Shiyan Deng	8e0998ca70	Move fx2trt and oss_acc_tracer to oss (#63101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63101 Move internal fx2trt to torch/fx/experimental/fx2trt and merge the two TRT interpreter we have right now. cc: mortzur as this might affect uru exporting script. Move oss_acc_tracer to torch/fx/experimental/fx_acc. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D30257909 fbshipit-source-id: 4e374965fbf88d72e91844d9e9b6ff9b98f467d1	2021-08-15 11:53:36 -07:00
Bert Maher	0ce4d30c44	Hide all symbols in llvm namespace (#63272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63272 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D30331695 Pulled By: bertmaher fbshipit-source-id: d35130c96f7e2a31fa86d9d80de59002e96301df	2021-08-15 11:29:43 -07:00
anjali411	045c4cb82f	Add copy button to code snippets in docs (#63149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63149 Test Plan: Imported from OSS Reviewed By: navahgar, albanD Differential Revision: D30308891 Pulled By: anjali411 fbshipit-source-id: ad51180ab2f27c4525682b2603bbf753bb8f1ce9	2021-08-15 06:25:32 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Kimish Patel	77a6436cac	[Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418 Debug handles have one to one correspondence with instruction, so just combine them in one. Test Plan: CI Imported from OSS Reviewed By: raziel Differential Revision: D29993661 fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07	2021-08-13 21:40:17 -07:00
Kimish Patel	1b04d99f55	[Pytorch Profiler] Introduce scopes to enableProfiler (#62417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417 This diff adds an option to make enableProfiler enable callbacks only for certain RecordScopes. Why? Profiling has some overhead when we repeatedly execute callbacks for alls copes. On mobile side when we often have small quantized models this overhead can be large. We observed that by only profiling top level op and skipping profiling of other atend ops called within we can limit this overhead. For example, instead of profling at::conv2d -> at::convolution -> at::convolution_ and further more if ops like transpose etc. are called, skipping profiling of those. Of course this limits the visibility, but at the least this way we get a choice. Test Plan: Imported from OSS Reviewed By: ilia-cher Differential Revision: D29993659 fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69	2021-08-13 21:40:15 -07:00
Kimish Patel	b00afe135d	[Pytorch Profiler] Add debug_handles to KinetoEvent (#62228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228 This diff adds debug handles to events and provides a way to use RECORD_FUNCTIONs that will pass debug_handles down to profiler, which will record it in the events. Why add debug_handles? For pytorch mobile, with lite interpreter, we generate debug handles that can be used for lazily symbolicate exception traces to model level stack trace. Similar to the model level stack trace you get in TorchScript models. The debug_handles also enable getting module hierarchy for lite interpreter model, support for which was added to KinetoProfiler in previous diffs. Followup plan: 1. Enabled scope callbacks such that lite interpreter can use it to profiler only top level ops. 2. Enable post processing callbacks that take KinetoEvents and populate module hierarchy using debug handles. This will let us use KinetoProfiler for lite interpter use cases on mobile. Aim is to use RAII guard to similarly generate chrome trace for mobile usecases as well, although only for top level ops. Test Plan: test_misc : RecordDebugHandles.Basic Imported from OSS Reviewed By: ilia-cher Differential Revision: D29935899 fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b	2021-08-13 21:40:14 -07:00
Kimish Patel	44b12ba862	[Pytorch Profiler] Move start timestamp to end of start callback (#62191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62191 This moves start timestamping to end of callback. This way we dont account for callstack/module hierarchy related overhead in op runtime. Test Plan: CI Imported from OSS Reviewed By: ilia-cher Differential Revision: D29910519 fbshipit-source-id: f462031a81ae12b3db7993cf482e5ad93a35e096	2021-08-13 21:40:12 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
leslie-fang-intel	385b082854	add substract of max and testcase (#63132 ) Summary: As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`. To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR. Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132 Reviewed By: VitalyFedyunin Differential Revision: D30280792 Pulled By: cpuhrsch fbshipit-source-id: 722821debf983bbb4fec878975fa8a4da0d1d866	2021-08-13 20:50:49 -07:00
Kushashwa Ravi Shrimali	baedb559e3	OpInfo: `nn.functional.conv_transpose2d` (#62882 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62882 Reviewed By: bdhirsh Differential Revision: D30280804 Pulled By: zou3519 fbshipit-source-id: e40cdf43e98c1f11e45df6b8bc13110b4d29c45f	2021-08-13 17:11:23 -07:00
Kefei Lu	f8e217a17e	refactor fx2trt example script so it can be imported as a library (#63262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63262 Just create a `__main__` guard. Test Plan: run linter, sandcastle tests Reviewed By: 842974287 Differential Revision: D30263617 fbshipit-source-id: 8044ce5d815b043c3778591384cb13d9a89d0048	2021-08-13 16:59:29 -07:00
Hanton Yang	3f43a8b9a3	[iOS] Add `LibTorch-Lite-Nightly` pod (#63239 ) Summary: D30090760 (`e182b459d9`) was reverted by D30303292 because of a lint issue in `LibTorch-Lite-Nightly.podspec.template`. Resubmit the diff after fixing the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63239 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D30315690 Pulled By: hanton fbshipit-source-id: f0fa719ffc3b8181ab28c123584ae5c1da8992c0	2021-08-13 16:21:41 -07:00
Sameer Deshmukh	809e1e7457	Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800 Reviewed By: VitalyFedyunin Differential Revision: D30303240 Pulled By: jbschlosser fbshipit-source-id: 8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12	2021-08-13 16:11:57 -07:00
Pruthvi Madugundu	ab7a472980	[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786 ) Summary: - HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm. - TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786 Reviewed By: bdhirsh Differential Revision: D30281682 Pulled By: seemethere fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673	2021-08-13 15:00:43 -07:00
Can Balioglu	e711b5ce6c	Respect user-set CMAKE_PREFIX_PATH (#61904 ) Summary: Fixes the case where the `CMAKE_PREFIX_PATH` variable gets silently overwritten by a user specified environment variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61904 Reviewed By: walterddr, malfet Differential Revision: D29792014 Pulled By: cbalioglu fbshipit-source-id: babacc8d5a1490bff1e14247850cc00c6ba9e6be	2021-08-13 13:49:05 -07:00
gmagogsfm	90a96e0642	Remove left-over print in test_diff_graph_inline_threshold (#63231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63231 Reviewed By: VitalyFedyunin Differential Revision: D30305851 Pulled By: gmagogsfm fbshipit-source-id: 43da3b5f49ad4a6a2d6d174acf792f3ccf41a463	2021-08-13 13:11:27 -07:00
Tanvir Zaman	cc6b023cba	Add CostInferenceFunction for SplitOp (#63133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63133 SplitOp is costly but missing cost inference function which hurts cost based balancing. Changes are: (1) Addition of CostInferenceFunction for SplitOp (2) Small fix in CostInferenceFunction for ConcatOp Test Plan: Added unit tests: buck test //caffe2/caffe2/python/operator_test:split_op_cost_test buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test Reviewed By: smacke Differential Revision: D30247360 fbshipit-source-id: 989e962f3a981acc85b73aac3fb23e603b7d1591	2021-08-13 12:28:15 -07:00
Meghan Lele	acdad8bc63	[docs] Merge note block in `torch.lu` documentation (#63156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63156 Summary This commit merges the four successive `Note` blocks that appear in the documentation for `torch.lu`. Each one only has one line in it, so all of them have been merged into one block with a bulleted list that contains the original items. Test Plan Continuous integration. Before <img width="888" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 39 a m" src="https://user-images.githubusercontent.com/4392003/129244443-b7d1594e-8833-4c20-a911-e1bf7ca88a8d.png"> After <img width="932" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 46 a m" src="https://user-images.githubusercontent.com/4392003/129244462-1f39dcdb-90e0-4fd9-a95f-343b0b6be1f1.png"> Fixes This commit fixes #62339. Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D30292633 Pulled By: SplitInfinity fbshipit-source-id: cb9071165629bfe7316b1d2fe952e4354c75d48f	2021-08-13 12:11:35 -07:00
Meghan Lele	e5c32cdde7	[docs] Remove `input` parameter from `Tensor.flatten` docs (#63180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63180 Summary This commit removes the `input` parameter from the signature for `Tensor.flatten` shown in its documentation. This parameter is accepted by `torch.flatten` but not `Tensor.flatten` (since the input is the `Tensor` on which `flatten` is invoked). Test Plan Continuous integration. Fixes This commit fixes #57478. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293156 Pulled By: SplitInfinity fbshipit-source-id: 4ad70d638af009fb6bdeb703433b306904d39a76	2021-08-13 12:10:16 -07:00
Meghan Lele	548fe682e2	[docs] Add cross references to `torch.transpose` and `torch.t` (#63177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63177 Summary This commit adds a link in the documentation for `torch.transpose` that directs to `torch.t` and vice versa. These two functions are related and it is useful for users of one to know about the other. Test Plan Continuous integration. Fixes This commit fixes #56267. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292654 Pulled By: SplitInfinity fbshipit-source-id: 8e60cd7a598ff8b4756cb30141399dfe8e118338	2021-08-13 11:51:55 -07:00
Meghan Lele	7107c367b5	[docs] Mention `vsplit`, `hsplit` and `tensor_split` in Tensor views doc (#63191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63191 Summary This commit adds `vsplit`, `hsplit` and `tensor_split` to the list of view ops on the Tensor Views documentation page. Test Plan Continuous integration. Before <img width="195" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 07 p m" src="https://user-images.githubusercontent.com/4392003/129275921-c1cfdf6c-9f1f-45f3-98b6-1de7a0f0cc84.png"> After <img width="197" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 15 p m" src="https://user-images.githubusercontent.com/4392003/129275936-de4afde7-0143-4e1d-b38f-c86256f4896c.png"> Fixes This commit fixes #62727. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293181 Pulled By: SplitInfinity fbshipit-source-id: 283783a4ccc3ebc50cb0a427e55c7a6cb618ffd7	2021-08-13 11:44:38 -07:00
Sameer Deshmukh	38a825c648	Allow Average Pooling modules to accept tensors with 0-dim batch sizes. (#62025 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. It introduces changes and tests for allowing the Average Pooling layers to accept tensors with 0 sized batch dimensions and return meaningful results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62025 Reviewed By: VitalyFedyunin Differential Revision: D30303256 Pulled By: jbschlosser fbshipit-source-id: 5f727e62a7c58d2b8bb49fcc3bd7688474917ba5	2021-08-13 11:31:17 -07:00
zhouzhuojie	de7ae9e9b6	[skip ci] fix workflow code generation (#63235 ) Summary: Fixes a clean git check with code generation introduced by https://github.com/pytorch/pytorch/pull/63148 `generated-win-vs2019-cuda10-py3.yml` was renamed as `generated-win-vs2019-cuda10.1-py3.yml` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63235 Reviewed By: VitalyFedyunin Differential Revision: D30306474 Pulled By: zhouzhuojie fbshipit-source-id: cbae1ace064e360e8ca0c0e997116bdb20d54d46	2021-08-13 10:38:30 -07:00
Mike Iovine	000e3a0881	[Static Runtime] Add pass to eliminate __getitem__/DictConstruct calls (#62429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62429 Introduce a new pass to eliminate calls to `prim::DictConstruct/aten::__getitem__`. Given a graph like this: ``` %2 : Dict = prim::DictConstruct(%key, %value) %3 : Tensor = aten::__getitem__(%2, %key) %4 : Tensor = op(%3) ``` This pass produces a graph like this (after dead code elimination): ``` %4 : Tensor = op(%value) ``` This optimization is applied in the static runtime. Test Plan: `buck test //caffe2/test:jit -- TestPeephole` local.forward performance summary About 3% runtime benefit. All `DictConstruct` calls optimized out, `__getitem__` calls reduced significantly (~50% of them are cut out) P438354810 local_request_only.forward performance summary About 14% runtime benefit. Again, all `DictConstruct` calls optimized out, 50% `__getitem__` calls removed. P438359742 There is some variance with runtime measurements, so take these numbers with a grain of salt. Also note that the benefit does not exist in the shrunk model since there are no `DictConstruct` calls Reviewed By: hlu1 Differential Revision: D29995087 fbshipit-source-id: f376376a46ff808115afd2d60446e5db8f6f752f	2021-08-13 10:21:16 -07:00
Kushashwa Ravi Shrimali	fcc1f87b6a	Fixing user inputs for low, high in `make_tensor` (#61108 ) Summary: TODOs: * [x] Do not clamp inputs for low and high when given and valid. * [x] Devise rules for modifying `low` and `high` when extremals/invalid values passed. * [x] Testing with `test_references_numerics_hard` with the revised changes. _(I've tested locally, the changes will take place in a separate PR though after offline discussion with mruberry)_ * [x] Revise comments/documentation for `make_tensor` See https://github.com/pytorch/pytorch/issues/61758 for tracker issue. cc: mruberry pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/61108 Reviewed By: VitalyFedyunin Differential Revision: D30296167 Pulled By: mruberry fbshipit-source-id: 67e8d15b173209a9c97ca013231494a5fa99f8c7	2021-08-13 10:13:12 -07:00
Natalia Gimelshein	720a7a0d81	[hackathon] fix benchmarking script in CONTRIBUTING (#63199 ) Summary: [skip ci] Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/63199 Reviewed By: mruberry Differential Revision: D30305487 Pulled By: ngimel fbshipit-source-id: 2704c4f08ab976a55c9f8c2fe54cd4f3f39412cf	2021-08-13 09:50:48 -07:00
Andres Suarez	bd9fad25c2	[codemod][lint][caffe2] Extend BLACK coverage Test Plan: Sandcastle Reviewed By: zsol Differential Revision: D30302716 fbshipit-source-id: f9724d4f4d1b8950f581cc2c6c77eedf19b4b6fc	2021-08-13 09:28:10 -07:00
Thomas J. Fan	c5f3ab6982	ENH Adds no_batch_dim to FractionalMaxPool2d (#62490 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62490 Reviewed By: bdhirsh Differential Revision: D30287143 Pulled By: jbschlosser fbshipit-source-id: 1b9dd932157f571adf3aa2c98c3c6b56ece8fa6e	2021-08-13 08:48:40 -07:00
Don Jang	61b49c8e41	[JIT] Add a flag to rethrow caught exception in jit interpreter (#63073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073 It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase. This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter. Reviewed By: Krovatkin Differential Revision: D30241792 fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c	2021-08-13 08:44:24 -07:00
Yukio Siraichi	32b6104f37	Port `norm` kernel to structured kernels. (#62711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62711 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30109866 Pulled By: ezyang fbshipit-source-id: 894c9496894d059c7690a174b75bbd4db7ed6016	2021-08-13 08:27:48 -07:00
Yukio Siraichi	07bb6e4fd0	Port `prod` kernel to structured kernels. (#62024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62024 Tracking issue: #55070 In this PR, I also broke down the meta functions of other reduction kernels (e.g. `all`, `argmax`, `sum`) into the composition of common patterns. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29847122 Pulled By: ezyang fbshipit-source-id: a6680a6cf6e59bb46b8ffe7bf2a3a611d6e0fd14	2021-08-13 08:27:46 -07:00
Yukio Siraichi	1280363bad	Port `mean` kernel to structured kernels. (#61643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61643 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29783866 Pulled By: ezyang fbshipit-source-id: dc95baf593096c03fb5f292ee6c36de3cc7f2b35	2021-08-13 08:26:01 -07:00
Andrew Gu	2d75703c6a	Remove req to call step() in training loop (#63164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63164 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284616 Pulled By: andwgu fbshipit-source-id: afdb677fb08851b139178a9f6d782196f26773e1	2021-08-13 08:22:44 -07:00
Andrew Gu	28f9e108b1	Pass `_allow_empty_param_list` into func opt ctor (#63163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63163 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284615 Pulled By: andwgu fbshipit-source-id: 4857f5b618ec5b007648737ab532ce605e5d70dc	2021-08-13 08:22:42 -07:00
Andrew Gu	bd81c9178a	Simplify data structures, add uniform approximation, fix mem leak (#63162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63162 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284617 Pulled By: andwgu fbshipit-source-id: 9bd9e5f89abcc0d3dac56b85d55cc88e843baa9f	2021-08-13 08:20:59 -07:00
Supriya Rao	75f198d48d	[docs][ao] update quantize_per_tensor to mention overloads (#63165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63165 Add details about the overloads for * list of tensors input * supporting tensor scale/zero-point inputs Test Plan: CI Imported from OSS Reviewed By: bdhirsh Differential Revision: D30291045 fbshipit-source-id: 9fc6418792c5e3a35417eeb8d31de4a4bfcbb7a5	2021-08-13 08:00:10 -07:00
Victor Quach	5abeac3ef7	Make saved tensors default hooks thread local (#62909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62909 This PR makes saved tensors default hooks thread local. This allows using default hooks in a multithreaded context. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30165416 Pulled By: Varal7 fbshipit-source-id: 10a7d580661d3d94bdaf398c4e076b7bea11c16b	2021-08-13 07:49:20 -07:00
Sameer Deshmukh	cb23976f9f	Allow 0-dim batch sizes for AdaptiveMaxPool and MaxPool. (#62088 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `MaxPool` and `AdaptiveMaxPool` to accept tensors whose batch size is 0. Some changes have been made to modernize the tests so that they will show the name of C++ function that throws an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62088 Reviewed By: bdhirsh Differential Revision: D30281285 Pulled By: jbschlosser fbshipit-source-id: 52bffc67bfe45a78e11e4706b62cce1469eba1b9	2021-08-13 07:33:17 -07:00
AspenStars	72bc6dc8c3	DOC Improve documentation for LayerNorm (#63144 ) Summary: In this [commit](`7026995f3c`) and [issue](https://github.com/pytorch/pytorch/pull/59178#issuecomment-897485295), the [Line 134](`47e286d024/torch/nn/modules/normalization.py (L134)`) will overwrite the "embedding" variate which would cause an error when initiating `nn.LayerNorm` function. I suggest renaming the "embedding" in [Line 133](`47e286d024/torch/nn/modules/normalization.py (L133)`) to "embedding_dim". The final example is: ``` batch, sentence_length, embedding_dim = 20, 5, 10 embedding = torch.randn(batch, sentence_length, embedding_dim) layer_norm = nn.LayerNorm(embedding_dim) ``` Fixes #{59178} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63144 Reviewed By: bdhirsh Differential Revision: D30288778 Pulled By: jbschlosser fbshipit-source-id: e74b11430e302dae5661bf6e830ee5ac6c1838c4	2021-08-13 07:04:40 -07:00
Alban Desmaison	aa665e1ab8	Revert D30090760: [iOS] Add podspec for libTorch-lite nightly build Test Plan: revert-hammer Differential Revision: D30090760 (`e182b459d9`) Original commit changeset: 361aa2ed24a1 fbshipit-source-id: 9c0dfee80a80eb012b142d3928204d6eb8025b0a	2021-08-13 06:45:43 -07:00
Kushashwa Ravi Shrimali	dcb5eb8d9b	OpInfo for `torch.nn.functional.normalize` (#62635 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261 cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62635 Reviewed By: H-Huang Differential Revision: D30136503 Pulled By: zou3519 fbshipit-source-id: 258c069f30d9c2a51ed27dadf94f3703b9432a4a	2021-08-13 06:36:50 -07:00
Nikita Vedeneev	741accb11e	Implements backward for `torch.lu_solve` (#61681 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61681 Reviewed By: ngimel Differential Revision: D30063116 Pulled By: mruberry fbshipit-source-id: e095b0cadfb7c8b37a7ef91bae5b5dc170d8ef1c	2021-08-12 21:17:11 -07:00
Charles David Hernandez	126ff6222e	Moving getattr_from_fqn to torch.quantization.utils (#63107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63107 moving this function because the functionality would be useful outside of ns ghstack-source-id: 135727260 Test Plan: buck test //caffe2/test:quantization_fx mode/dev-nosan --keep-going --config client.id=nuclide --show-full-output -- suite Reviewed By: supriyar Differential Revision: D30260735 fbshipit-source-id: 58deabdd0f3b03b0ee7ee92be0548a0945084d65	2021-08-12 20:59:01 -07:00
Thomas J. Fan	07b00fc324	ENH Migrate nll_loss2d from THC to ATen (#62826 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24608 Fixes https://github.com/pytorch/pytorch/issues/24607 With the following benchmark, the backward pass runs a little slower. This is strange since the implementation should be exactly the same. <details> <summary>Benchmark script</summary> ```python from itertools import product import torch import torch.nn as nn import torch.nn.functional as F import time torch.manual_seed(0) MS_PER_SECOND = 1000 def _time(): torch.cuda.synchronize() return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 3 n_runs = 30 reductions = ["none", "sum", "mean"] Ns = [128, 256, 512] Hs = [128, 256, 512] for reduction, N, H in product(reductions, Ns, Hs): total_fwd_time = 0 total_back_time = 0 if reduction == "none": grad_out = torch.randn(N, H, H, device=device) else: grad_out = torch.randn(1)[0] for _ in range(n_runs): input = torch.randn(N, C, H, H, device=device, requires_grad=True) target = torch.rand(N, H, H, device=device).mul(3).floor().long() # forward start = _time() result = F.nll_loss(input, target, reduction=reduction) total_fwd_time += _time() - start result = F.nll_loss(input, target, reduction=reduction) for _ in range(n_runs): # backward start = _time() result.backward(grad_out, retain_graph=True) total_back_time += _time() - start fwd_avg = total_fwd_time / n_runs bwd_avg = total_back_time / n_runs print( f"input size({N}, {C}, {H}, {H}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)" ) ``` </details> <details> <summary>master results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.34 (ms), back: 0.57 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.56 (ms), back: 3.85 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.54 (ms), back: 16.62 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.26 (ms), back: 1.78 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.22 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.38 (ms), back: 33.29 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.41 (ms), back: 4.05 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.32 (ms), back: 16.46 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.20 (ms), back: 66.68 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.08 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 5.66 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 2.86 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 11.23 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.13 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.54 (ms), back: 5.65 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.22 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 2.87 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 11.23 (ms) ``` </details> <details> <summary>PR results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.33 (ms), back: 0.59 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.51 (ms), back: 3.92 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.52 (ms), back: 17.05 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.23 (ms), back: 1.85 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.45 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.39 (ms), back: 34.21 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.40 (ms), back: 4.18 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.33 (ms), back: 16.90 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.04 (ms), back: 68.36 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.07 (ms), back: 0.25 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 3.33 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.70 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 6.58 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 3.34 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 13.07 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.26 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 3.34 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.72 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.53 (ms), back: 6.60 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 3.33 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 13.07 (ms) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/62826 Reviewed By: bdhirsh Differential Revision: D30282279 Pulled By: ngimel fbshipit-source-id: 4aa0ff3f8af0632957417931d332ec486a12b52d	2021-08-12 18:07:15 -07:00
Alexander Soare	219ba6575b	add autowrap_functions kwarg to fx.Tracer (#62106 ) Summary: Implements feature request https://github.com/pytorch/pytorch/issues/62021 Test it out with ```python from torch import fx from torch import nn def fx_int(x): return int(x) class MyModule(nn.Module): def forward(self, x): return fx_int(x.shape[0] / 2) tracer = fx.Tracer(autowrap_functions=(fx_int,)) # or remove kwarg to demonstrate symbolic trace error tracer.trace(MyModule()) ``` First time contributor, so please advise if I could have done anything to make lives easier for next time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62106 Reviewed By: SplitInfinity, driazati Differential Revision: D30080834 Pulled By: jamesr66a fbshipit-source-id: 68fadf8c881ea7930e7afd62b642874010fe4903	2021-08-12 17:38:25 -07:00
Bradley Davis	7a1ab9f5d7	[fx] store Tracer class on Graph and GraphModule for package deserialization [v2, the re-do] (#63121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63121 Re-introducing this diff with a small change to ignore setting Tracer classes on GraphModules when the Tracer class is defined not at module-level (prevents pickling). Previous, reverted Pull Request: https://github.com/pytorch/pytorch/pull/62497 Reviewed By: houseroad Differential Revision: D30252776 fbshipit-source-id: 42d2bc846e4b32d00563419c38c02b63cd0986e6	2021-08-12 17:28:50 -07:00
Karol Sputo	988ef190e3	Show warning in eager mode for empty containers (#62978 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62978 Reviewed By: navahgar Differential Revision: D30278343 Pulled By: ansley fbshipit-source-id: ebb19f7b8a10720f2612b99a2668d1ebbc1f2d16	2021-08-12 16:11:27 -07:00
Hanton Yang	e182b459d9	[iOS] Add podspec for libTorch-lite nightly build (#62691 ) Summary: The nightly pod version will be aliased with [PyTorch nightly build version](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fmaster%2F.circleci%2Fscripts%2Fbinary_populate_env.sh%23L88&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (`fa22f6303f`)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (`2d884f2263`)aBSlOy) and [CocoaPods version specification](https://l.facebook.com/l.php?u=https%3A%2F%2Fguides.cocoapods.org%2Fusing%2Fthe-podfile.html%23specifying-pod-versions&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (`fa22f6303f`)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (`2d884f2263`)aBSlOy), the version format of the podspect is `PyTorch version + nightly build date`, like `1.10.0.20210812`. Usage: 1. Add `pod 'LibTorch-Lite-Nightly'` to `Podfile` 2. Run `pod install` to install the nightly built lib 3. Run `pod update` to update the lib to the latest version Pull Request resolved: https://github.com/pytorch/pytorch/pull/62691 Test Plan: * Test on [TestApp](https://github.com/pytorch/pytorch/tree/master/ios/TestApp) and [HelloWorld](https://github.com/pytorch/ios-demo-app): Podfile: `pod 'LibTorch-Lite-Nightly'` * Test on Private Pod: {F642106928} Reviewed By: xta0 Differential Revision: D30090760 Pulled By: hanton fbshipit-source-id: 361aa2ed24a11d6aced8374cb45f70f49bd5da52	2021-08-12 15:35:14 -07:00
Rong Rong (AI Infra)	0b89e69e7c	[BE] delete GHA generated workflow files before regen (#63148 ) Summary: Unlike circle which all workflow goes in one file, GHA legacy generated files will stay silently in once's PR. e.g. when we change build_environment name and that's not ideal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63148 Reviewed By: bdhirsh Differential Revision: D30283382 Pulled By: walterddr fbshipit-source-id: ffdd5bf9561dd38499052855a12ee5cf838a20b0	2021-08-12 14:43:00 -07:00
Tao Xu	ba25527ffc	[iOS][GPU] Fix the clamp shader function for x86_64 (#63062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63062 Pervasively, due to the need of supporting 10.0, we used a fp16 version of the clamp kernel on Metal, which didn't work well on x86_64. Since we don't need to support 10.0 anymore, we can use the fp32 version, which works both on arm64 and x86_64. ghstack-source-id: 135536785 Test Plan: - `buck test pp-macos` - Op tests in the playground app {F641013793} Reviewed By: husthyc Differential Revision: D30239931 fbshipit-source-id: 6ad1bf71422b537e052fbd7b7465ba8deb7ca0cf	2021-08-12 13:20:27 -07:00
Victor Quach	ed7ece389d	Forbid inplace modification of a saved tensor's pack_hook input (#62717 ) Summary: When using saved tensors hooks (especially default hooks), if the user defines a `pack_hook` that modifies its input, it can cause some surprising behavior. The goal of this PR is to prevent future user headache by catching inplace modifications of the input of `pack_hook` and raising an error if applicable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62717 Reviewed By: albanD Differential Revision: D30255243 Pulled By: Varal7 fbshipit-source-id: 8d73f1e1b50b697a59a2849b5e21cf0aa7493b76	2021-08-12 12:40:10 -07:00
Howard Huang	aa5141f204	Update CONTRIBUTING.md to remove ProcessGroupAgent (#63160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63160 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284439 Pulled By: H-Huang fbshipit-source-id: 53c31b6917ef5e2125e146fb0ed73ae3d76a8cf9	2021-08-12 12:26:12 -07:00
Edward Wang (EcoF)	96fb1a56ea	add use_strict_trace to tensorboard add_graph method (#63120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63120 FAIM returns dictionaries as the model output, which throws an error when trying to trace using add_graph. Pass in `strict` to the tracer to make this user configurable. User post: https://fb.workplace.com/groups/pytorchLightning/permalink/1510194972650369/?comment_id=1510252919311241&reply_comment_id=1510281112641755 Test Plan: unit test Reviewed By: Reubend Differential Revision: D30265890 fbshipit-source-id: 58b25d9500b875a29a664aa9ef4c1e7f13631fa1	2021-08-12 12:12:12 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
jiej	ed0b8a3e83	LayerNorm Support in autodiff: (#50467 ) Summary: 1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward 2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule 3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467 Reviewed By: eellison Differential Revision: D30232864 Pulled By: jansel fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a Co-authored-by: Ryan Spring <rspring@nvidia.com> Co-authored-by: Jie <jiej@nvidia.com>	2021-08-12 11:05:53 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Kushashwa Ravi Shrimali	aac3c7bd06	[reland] OpInfo: `adaptive_avg_pool2d` (#62935 ) Summary: This PR is an attempt to reland https://github.com/pytorch/pytorch/pull/62704. What has changed? The op has non-deterministic behavior, hence an appropriate `gradcheck` wrapper had to be added. cc: mruberry zou3519 heitorschueroff kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62935 Reviewed By: anjali411 Differential Revision: D30225095 Pulled By: zou3519 fbshipit-source-id: 644873cc21d44b19c8b68f9edff691913778de0e	2021-08-12 09:46:38 -07:00
Rong Rong (AI Infra)	daba551922	[BE] shorten CI name part2 (#63030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62357 there's no need to specify cudnn version since they are recommended from cuda version already. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63030 Reviewed By: zhouzhuojie, driazati Differential Revision: D30226354 Pulled By: walterddr fbshipit-source-id: 7e2dc577810e0ce80ee27569c25a814566250ab1	2021-08-12 08:14:22 -07:00
Rohan Varma	eea52b7d47	Skip zero test on windows (#63087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63087 Test failed on windows unexpectedly see https://github.com/pytorch/pytorch/issues/63086. Skip for now while we investigate ghstack-source-id: 135631811 Test Plan: CI Reviewed By: ngimel Differential Revision: D30251300 fbshipit-source-id: 8acb1ea8863c654c171fe989ac24446c321c085d	2021-08-12 00:38:42 -07:00
Peter Bell	4d7a12f68b	BatchNorm: Use resize_output and empty, instead of empty_like (#63084 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62967 This lets each of the three implementations choose which memory format to use for the output, meaning channels_last can be used in more cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63084 Reviewed By: saketh-are Differential Revision: D30255740 Pulled By: ngimel fbshipit-source-id: 48d42850952ec910b29521a1c4e530eb2b29df5e	2021-08-11 23:47:24 -07:00
Supriya Rao	d5a7579597	[quant] Make version 1 the default for get_default_qat_qconfig (#63043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043 In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward. Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1. The state_dict will still have the same attribute for the observer/fake_quant modules. There may be some numerics difference between the old observer code in observer.py and the new fused module that was re-written in C++/CUDA to perform observe + fake_quantize. This PR also updates the test to check for the new module instead of the default FakeQuantize module. Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that. Test Plan: python test/test_quantization.py TestSerialization.test_default_qat_qconfig Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30232222 fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b	2021-08-11 22:06:44 -07:00
Pritam Damania	91525d42d9	Fix sharded tensor tests. (#63054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054 1) Ensure these tests are skipped in environments without any GPUs. 2) Add the test to run_test.py ghstack-source-id: 135595698 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D30239159 fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509	2021-08-11 21:46:45 -07:00
Meghan Lele	bf7d03ff1f	Port `log_softmax_backward_data` to structured kernel (#62372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62372 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30240242 Pulled By: SplitInfinity fbshipit-source-id: 67d5e4b1543c2e43675e905ce18ca49c11e33748	2021-08-11 21:03:59 -07:00
Meghan Lele	ba603594fd	Port `log_softmax` to structured kernel (#57374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57374 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30240243 Pulled By: SplitInfinity fbshipit-source-id: de6617c75d16e26d607a884c25b8752b7b561737	2021-08-11 21:02:48 -07:00
zhouzhuojie	d2eda7f2f3	Add ciflow_ruleset.json generator along with gha ci (#63097 ) Summary: - Add `.github/generated-ciflow-ruleset.json` for ciflow-bot (so that we can generate better comments) - The lint job also checks git dirty to make sure that the file is always in sync with ciflow configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/63097 Reviewed By: saketh-are Differential Revision: D30263278 Pulled By: zhouzhuojie fbshipit-source-id: bad68105a228e892ba071b29ecfdf433e1038054	2021-08-11 17:14:40 -07:00
Jiewen Tan	04caef8e1d	Improve IMethod::getArgumentNames to deal with empty argument names list (#62947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62947 This diff improved IMethod::getArgumentNames to deal with empty argument names list. Test Plan: buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode Reviewed By: wconstab Differential Revision: D30179974 fbshipit-source-id: c7aec35c360a73318867c5b77ebfec3affee47e3	2021-08-11 16:44:00 -07:00
Amy He	5cf32c1d09	Fix Nnapi backend execute's dangling pointer (#63092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63092 Bug discovered while testing NNAPI Delegate on SparkAR. Using ``` c10::IntArrayRef order = {0, 2, 3, 1}; fixed_inputs.push_back(tensorInp.get(i).permute(order).contiguous()); ``` results in a garbage value for order in `permute()`. Moving order inside the call to `permute()` fixes this issue. Problem is seemingly related to https://github.com/pytorch/pytorch/issues/44409, but luckily the solution in this case is simple. Bug wasn't caught earlier, since regular unit tests weren't affected by the dangling pointer, and address sanitizer NNAPI tests are turned off due to there being a different failure (T95764916). ghstack-source-id: 135526129 Test Plan: Run Unit tests: `python test/test_jit.py` Build and run SparkAR on an Android phone at the top of this diff stack (D30173959): `buck build --show-output arstudioplayer_arm64_debug -c pt.enable_nnapi=1` Reviewed By: raziel, iseeyuan Differential Revision: D30237504 fbshipit-source-id: c946d81feefc453b43d9295d8d6f509cafdcec03	2021-08-11 14:26:48 -07:00
Nikita Shulga	709ac6853a	Fix warnings (#62930 ) Summary: Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python. Avoid unnecessary copies in range loop Fix number of signed-unsigned comparisons Found while building locally on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930 Reviewed By: albanD Differential Revision: D30171981 Pulled By: malfet fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e	2021-08-11 14:07:10 -07:00
Tao Xu	855e8f2b17	[iOS][GPU] Consolidate array and non-array kernel for upsampling_nearest2d (#63061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63061 Cleanup the redundant shader code for the upsampling nearest kernel. ghstack-source-id: 135524349 Test Plan: - `buck test pp-macos` - Op tests in PyTorchPlayground app Reviewed By: husthyc Differential Revision: D30236905 fbshipit-source-id: e1e001b446452b077e6db719b0519c9070f3300b	2021-08-11 13:29:39 -07:00
Richard Barnes	456364729e	irange-ify 13b (#62476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62476 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30001445 fbshipit-source-id: 6f4525338c80e9f929695f47f36ca9c72d96a75d	2021-08-11 13:13:44 -07:00
CaoE	31c1983603	Add BFloat16 support for unique and unique_consecutive on CPU (#62559 ) Summary: Add BFloat16 support for unique and unique_consecutive on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559 Reviewed By: saketh-are Differential Revision: D30250675 Pulled By: ngimel fbshipit-source-id: 26e48f971d87f3b86db237e8ad3a4b74eb3c1def	2021-08-11 12:54:46 -07:00
Alexander Grund	51a67d3168	Add Github action to upload full source releases (#63022 ) Summary: Those release tarballs include the submodules. The action is run on every tag, master-branch push but will not upload anything. This makes sure nothing is broken when an actual release happens. On created releases the action runs and uploads the tarball Fixes https://github.com/pytorch/pytorch/issues/62708 As I don't have access rights here and testing is obviously hard (as a new release needs to be published), I set up a test at https://github.com/Flamefire/pytorch/releases/tag/testtag See also the run(s) at https://github.com/Flamefire/pytorch/actions/workflows/create_release.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/63022 Reviewed By: saketh-are Differential Revision: D30256253 Pulled By: seemethere fbshipit-source-id: ab5fe131452de14ae3768b91c221e68c536cb3aa	2021-08-11 12:47:17 -07:00
Xiang Gao	821c1edea9	Embedding thrust->cub: unique (#63042 ) Summary: Followup of https://github.com/pytorch/pytorch/pull/62495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63042 Reviewed By: saketh-are Differential Revision: D30231084 Pulled By: ngimel fbshipit-source-id: 03b0a88107e8a2aee3570881d81bf2b676f525cd	2021-08-11 12:40:36 -07:00
Howard Cheng	fa22f6303f	[PyTorch] Add flop count for addmm (#61895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895 * Add FLOP count for addmm, should be `2mnk`. Share the same code path for `addmm` and `mm`. Test Plan: Imported from OSS `python test/test_profiler.py` Run a sample profile and check that FLOPS for `aten::addmm` is correct. `[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit` `[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest'` Reviewed By: dskhudia Differential Revision: D29785671 fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe	2021-08-11 12:33:43 -07:00
Salil Desai	fb4ba9e664	XNNPack Input Pointer Caching Comment (#62818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62818 Added a comment to explain why we no longer need to manually cache pointers/parameters for convolution, as removed in D29777605 (`f5c6c3947e`) Test Plan: Sandcastle tests (no code changed) Reviewed By: kimishpatel Differential Revision: D30113489 fbshipit-source-id: d697f05816acbd367d59a4aced1925303c683d40	2021-08-11 11:55:42 -07:00
rusty1s	82123758ba	`_convert_coo_to_csr` CPP and CUDA functionality (#61838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57381 and improves https://github.com/pytorch/pytorch/pull/61340 via dedicated `coo_to_csr` functionalities. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61838 Reviewed By: ezyang Differential Revision: D30132736 Pulled By: cpuhrsch fbshipit-source-id: a1fd074c0d70366a524d219a620b94f8bed71d7c	2021-08-11 11:37:20 -07:00
Pritam Damania	b8e6144e0a	Add a _RemoteDevice structure for ShardedTensor/ShardingSpec. (#62927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927 As part of the ShardedTensor work, we realized we do need some sort of _RemoteDevice structure that deals with our format of "workername/device" so that users don't have to worry about parsing this string directly. Right now this structure is just the bare minimum and is mostly a container for describing a remote device. It is currently only used in ShardedTensor, ShardingSpec and RemoteModule. Once we actually have a consolidated remote device proposal, this class can be extended appropriately if needed. ghstack-source-id: 135534086 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30170689 fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f	2021-08-11 11:27:32 -07:00
Jacob Szwejbka	b746fed164	[Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005 Realized I forgot to move the Runtime half of these functions be within the struct. Test Plan: ci Reviewed By: pavithranrao Differential Revision: D30205521 fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5	2021-08-11 11:15:57 -07:00
Stephen Macke	3d3ad0a52f	[easy] add an `inplace` argument to MutableNetProto.to_net() and core.Net() constructor (#63068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068 The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested. Test Plan: Let CI run. Differential Revision: D29976510 fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e	2021-08-11 11:10:52 -07:00
zhouzhuojie	c090ae291e	Fix gha render-test-result mixed failure passthrough (#63056 ) Summary: To fix something like https://github.com/pytorch/pytorch/actions/runs/1114555082 ![image](https://user-images.githubusercontent.com/658840/128956528-86997457-5e18-4ae1-83cc-aa7d0ca03c0e.png) Not sure why `needs.test.result` doesn't capture the `failure` case before, so changed it to `needs.test.result != 'skipped' \|\| failure()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63056 Reviewed By: walterddr, tktrungna Differential Revision: D30240112 Pulled By: zhouzhuojie fbshipit-source-id: d159cc3f79ed5d604ae12583736b37ac28e8d87c	2021-08-11 09:45:31 -07:00
Yida Wang	4ea6a3aa74	Fix issues with printing certain torch modules (#62447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54420 When I tested on master, with the testing code, there were multiple objects on the garbage collector that cannot be printed. Testing code: ``` import torch import gc import os import sys print(torch.__version__) a = torch.rand(10) print(a) objects = gc.get_objects() for i in range(len(objects)): print(objects[i]) ``` ### 1 ``` print(torch.classes) ``` Like SplitInfinity has mentioned in the GitHub issue, the solution here is to set `__file__` for `torch.classes` to something. Similar to [_ops.py](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L69), where `__file__` is set to `_ops.py`, we could set `__file__` for torch.classes to `_classes.py`. ### 2 ``` print(torch._ops.ops.quantized) print(torch._ops.ops.atan) ``` When we try to print these two modules, it will call `_OpNamespace::__getattr__`, but the `op_name` is `__file__`. This becomes a problem when `torch._C._jit_get_operation(qualified_op_name)` [(link)](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L60) tries to look for an actual op on the native C++ side. Only when we get the attribute for an actual op, e.g. `print(torch._ops.ops.quantized.elu)`, the `op_name` becomes proper (e.g. `elu`). My current solution is to return a hardcoded string (i.e. “torch.ops”) if `op_name` is `"__file__"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62447 Reviewed By: saketh-are Differential Revision: D30234654 Pulled By: yidawang-oss fbshipit-source-id: de43a8f599739c749fb3307eea015cc61f1da60e	2021-08-11 09:40:41 -07:00
Peter Bell	5c00091f02	Shard python_functions.cpp (#62186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62186 This file takes 6 minutes on its own to compile and is the limiting factor for building `libtorch_python` on a 32-core threadripper. This splits the file into 5 shards which take around 50 seconds each to compile. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962046 Pulled By: albanD fbshipit-source-id: df13cfaebd54296f10609f67ae74a850c329bd37	2021-08-11 09:21:26 -07:00
Sze Wai Celeste Yuen	c5de83adca	Fix inconsisteny between Python and JIT power operation (#62842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62842 Test Plan: Wrote unit test TestAtenPow to test behavior of aten::pow when: 1. base is int, exponent is int 2. base is int, exponent is float 3. base is float, exponent is int 4. base is float, exponent is float Specifically, we test that when base is zero and exponent is negative, we raise error. In all other cases, we expect behavior to be the same as the result returned by Python. It is because the cpp code relies on overloading, we need to make sure all combinations of types give us the expected result. Reviewed By: zhxchen17 Differential Revision: D30146115 Pulled By: szewaiyuen7 fbshipit-source-id: dc661897ad38da286ee454120fbe41314b7f2995	2021-08-11 08:41:46 -07:00
Dmytro Dzhulgakov	f446e835ee	Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527 If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers. Test Plan: sandcastle Reviewed By: mdschatz Differential Revision: D30030051 fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd	2021-08-11 01:10:09 -07:00
Pritam Damania	f7611b31aa	[4/N] Enable opt-asan for distributed unit tests. (#62051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051 The goal here is to enable opt-asan for "spawn" based unit tests since this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for "spawn" unit tests as well. This means we can completely remove fork unit tests from the code base since the only purpose for these tests was to run ASAN. ghstack-source-id: 135523770 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29854514 fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b	2021-08-10 22:38:31 -07:00
Lu Fang	847a7cfa10	Back out "[fx] store Tracer class on Graph and GraphModule for package deserialization" (#63053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63053 Original commit changeset: eca09424ad30 The original diff - D30019214 (`6286d33878`) breaks the publish flow in model saving. Test Plan: ci Differential Revision: D30236517 fbshipit-source-id: 3e05db02fc1cbbc2ed262c83bf56d555277abb34	2021-08-10 21:58:08 -07:00
Rishi Puri	324673a537	rebase for autocast updates to include device_type and dtype flags (#61002 ) Summary: Fixes #{55374} https://github.com/pytorch/pytorch/issues/55374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002 Reviewed By: malfet, mruberry Differential Revision: D30016812 Pulled By: ngimel fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033	2021-08-10 20:03:12 -07:00
Wei-Sheng Chin	a55cae3d37	Fix missing element types and shapes when autograd.Function has multiple tensor outputs (#57966 ) Summary: When generating IR for autograd.Function, if the function has multiple outputs, a TupleUnpack may be inserted after the original function node, and Pytorch only assigns proper information (tensor element type and shape) to the TupleUnpack and forgets the original function node. In contrast, if autograd.Function only produces one output, the original function node may have tensor element type and shape in its output schema. Before this PR: - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5]) - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output_0 (tensor), output_1 (tensor) -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7]) After this PR: - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5]) - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp ->output_0 (tensor, dtype=float32, shape=[4, 5]), output_1 (tensor, dtype=float32, shape=[6, 7]) -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57966 Reviewed By: zhxchen17 Differential Revision: D30208207 Pulled By: gmagogsfm fbshipit-source-id: 42a3d1f9c0932133112a85df0c49cf4ea0afa175	2021-08-10 19:48:11 -07:00
Natalia Gimelshein	390c0ac403	remove dead code (#63031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63031 Reviewed By: mruberry Differential Revision: D30225094 Pulled By: ngimel fbshipit-source-id: 3666a0fa120bea85225cd3ee04f89d64952d2862	2021-08-10 18:41:13 -07:00
Natalia Gimelshein	94c5309369	Revert D30199482: [pytorch][PR] Add BFloat16 support for unique and unique_consecutive on CPU Test Plan: revert-hammer Differential Revision: D30199482 (`fc0b8e6033`) Original commit changeset: 6f2d9cc1a528 fbshipit-source-id: 39e9f202bcbd978525f792173d4f97b5b329b5b1	2021-08-10 18:27:18 -07:00
Richard Barnes	d1f9c03cef	Use `const auto` with irange (#62990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62990 Test Plan: Sandcastle Reviewed By: zhouzhuojie Differential Revision: D30199748 fbshipit-source-id: 284b208ffa3c6c4749e5ac9b1fccb28914590f2c	2021-08-10 17:59:01 -07:00
Eddie Yan	d893b44cd8	change nccl version reporting (#62916 ) Summary: https://github.com/pytorch/pytorch/issues/62295 Previously the packing and unpacking of the NCCL version "integer" was done to have parity with the upstream NCCL version encoding. However, there doesn't seem to be any place where this integer is directly compared with a version integer sourced from upstream NCCL, and syncing the encoding seems to be error-prone (e.g., a recent change where a special case was added for minor versions >= 10 `7e51592129/src/nccl.h.in (L22)`). This patch changes the reporting to return a tuple of version numbers instead (to preserve ease-of-use for comparisons) and tweaks the passing between C/Python to avoid the digit overflow problem. CC ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/62916 Reviewed By: anjali411 Differential Revision: D30201069 Pulled By: mrshenli fbshipit-source-id: 2e4e7c69f001c3f22bd04aa6df6a992e538bea45	2021-08-10 17:46:27 -07:00
tktrungna	f307120df4	Update test_torch_deploy (#62838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62838 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30193141 Pulled By: tktrungna fbshipit-source-id: 72c2bd3a740fca0f72e4803df505240193692c44	2021-08-10 16:29:50 -07:00
tktrungna	af6ed084b4	update test_libtorch (#62797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62797 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30193140 Pulled By: tktrungna fbshipit-source-id: d8e54c403f42abbbbe4556abf40c22a7955df737	2021-08-10 16:29:48 -07:00
tktrungna	2f5ac9c0ba	update test distributed (#62796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30193142 Pulled By: tktrungna fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60	2021-08-10 16:29:47 -07:00
tktrungna	dfe8445cd7	update test_vulkan (#62795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62795 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30124421 Pulled By: tktrungna fbshipit-source-id: 235ba166b02f7334e89cb2493024067851bf5b9b	2021-08-10 16:29:45 -07:00
tktrungna	25c3b9dc10	update test_rpc (#62781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62781 Test Plan: Imported from OSS Reviewed By: walterddr, zhouzhuojie Differential Revision: D30124391 Pulled By: tktrungna fbshipit-source-id: 99c275d6c9f23b4f274fd0ca19a16879ed27afd5	2021-08-10 16:28:35 -07:00
Matej Sladek	f807229fd4	[ONNX] add support for prim::Unitialized in lower_tuples pass (#56912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56911 Code from issue generates this Torchscript: ``` graph(%self : __torch__.MyModule, %t.1 : Tensor): %12 : None = prim::Constant() %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28 %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15 %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31 %33 : (Tensor, Tensor) = prim::Uninitialized() %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8 block0(): = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12 -> (%33) block1(): %11 : int[] = prim::ListConstruct(%9) %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19 %18 : int[] = prim::ListConstruct(%9) %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35 %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23) -> (%24) return (%34) ``` Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized. Solution is: 1. add prim::Unitialized to supported_op in lower_tuples pass 1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912 Reviewed By: nikithamalgifb Differential Revision: D29837200 Pulled By: SplitInfinity fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda	2021-08-10 16:21:16 -07:00
Howard Huang	4d0497034c	Remove process_group_agent and faulty_process_group_agent files (#62985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985 Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent. Test Plan: CI tests Reviewed By: pritamdamania87 Differential Revision: D30195576 fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811	2021-08-10 15:57:39 -07:00
Natalia Gimelshein	790553811c	fix sort and topk with discontiguous out (#63029 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62645 and https://github.com/pytorch/pytorch/issues/62940. The root cause of those bugs is in the bad interaction between `collapseDims` and setting the size of sorting/topK dimension to 1. If all other dimensions happen to be 1, `collapseDims` thinks that that `1` dimension is collapsible (even though it was specifically marked to be preserved) and loses its stride information. If dimension was really of size 1, the stride information would be unimportant, but since in reality that dimension is not 1 and was set to 1 for convenience, the loss of stride information results in incorrect outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63029 Reviewed By: heitorschueroff Differential Revision: D30224925 Pulled By: ngimel fbshipit-source-id: 269dd375c5cd57c6007fe91f729f8c60a2e7a264	2021-08-10 15:45:28 -07:00
Hanton Yang	500b24e303	[iOS] enable Metal in the nightly build (#62855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62855 Test Plan: Test on Private Pod with the [HelloWorld](https://fburl.com/3hiwkkhm) demo Reviewed By: xta0 Differential Revision: D30174151 Pulled By: hanton fbshipit-source-id: 22cd8663ac239811bf8ed1c3b6301460d798dbfa	2021-08-10 15:18:58 -07:00
Christian Puhrsch	3beb65d45d	test_cudnn_convolution_relu skipCUDAIfRocm Summary: skip rocm test for test_cudnn_convolution_relu Test Plan: This skips a test Reviewed By: ngimel Differential Revision: D30233620 fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809	2021-08-10 15:15:23 -07:00
Victor Quach	557047eb4c	Add docstring for saved tensors default hooks (#62361 ) Summary: Add documentation for the saved tensors default hooks introduced in https://github.com/pytorch/pytorch/issues/61834 / https://github.com/pytorch/pytorch/issues/62563 Sister PR: https://github.com/pytorch/pytorch/issues/62362 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62361 Reviewed By: zou3519 Differential Revision: D30081997 Pulled By: Varal7 fbshipit-source-id: cb923e943e1d96db9669c1d863d693af30910c62	2021-08-10 14:59:38 -07:00
Tao Xu	dbb7be2e79	[iOS][CI] Store every version of nightlies in S3 (#63039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63039 Test Plan: Imported from OSS Reviewed By: hanton Differential Revision: D30229385 Pulled By: xta0 fbshipit-source-id: 15b438a6326159258803ab97e67dc9ec5db50d59	2021-08-10 14:33:36 -07:00
Jerry Zhang	990c2190d1	[quant][graphmode] Reference pattern support for elu (#62607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62607 Removing the quantize handler for elu since it can be covered by DefaultNodeQuantizeHandler Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053977 fbshipit-source-id: 426789443e928bb01a88907de616cbda5866f621	2021-08-10 14:00:39 -07:00
kshitij12345	f836c4f8bd	[fix] TestMultiThreadAutograd: propagate exception from child thread to main thread (#63018 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63018 Reviewed By: anjali411 Differential Revision: D30225856 Pulled By: Varal7 fbshipit-source-id: b5dd7999de5060e06f8958ea3ce49e0b74110971	2021-08-10 13:56:49 -07:00
Amy He	bfa67264d1	[1/N] Nnapi backend execute and compile (#62272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62272 Added Android NNAPI delegate implementation of runtime initialization (compilation) and execution. The delegate's preprocess step was [previously implemented](https://github.com/pytorch/pytorch/pull/62225). Now, the reset of the delegate, which implements client-side execution, is added. nnapi_backend_lib.cpp: Implementation of delegate's compile and execute. `execute()` is essentially a C++ implementation of [`NnapiModule`](https://github.com/pytorch/pytorch/blob/master/torch/backends/_nnapi/prepare.py), which wraps an NNAPI Compilation and handles preparation of weights, inputs, and outputs. - Any steps that can be done before execution are moved to `compile()`. - `init()` cannot be moved to `compile()` because it requires real inputs for dynamic shaping. - `shape_compute_module` cannot currently be deserialized in `compile()`, since mobile::Module has no IValue conversion. - Processed arguments that are modified by `init()` must be kept as member variables. Any other processed arguments are passed through a dictionary, `handles`. nnapi_bind.cpp & nnapi_bind.h: Created a header file for `nnapi_bind.cpp`, so that it's NnapiCompilation class can be used by `nnapi_backend_lib.cpp`. test_backend_nnapi.py: Enabled execution testing. ghstack-source-id: 135432844 Test Plan: Imported from OSS Tested on devserver. 1. Load and unpack a special devserver build of NNAPI: `jf download GICWmAAzUR0eo20TAPasVts8ObhobsIXAAAz --file "nnapi-host-linux.tar.xz"` 2. `export LIBNEURALNETWORKS_PATH=/path/to/libneuralnetworks.so` 3. Run unittests: `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` TODO: test with lite interpreter runtime Reviewed By: raziel, iseeyuan Differential Revision: D29944873 fbshipit-source-id: 48967d873e79ef2cce9bcba2aeea3c52f7a18c07	2021-08-10 13:37:39 -07:00
CaoE	fc0b8e6033	Add BFloat16 support for unique and unique_consecutive on CPU (#62559 ) Summary: Add BFloat16 support for unique and unique_consecutive on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559 Reviewed By: anjali411 Differential Revision: D30199482 Pulled By: ngimel fbshipit-source-id: 6f2d9cc1a528bea7c723139a4f1b14e4b2213601	2021-08-10 13:22:54 -07:00
Jerry Zhang	cb7f35d47a	[quant][refactor] Checking activation_dtype instead of activation_post_process (#62489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62489 Addressing comment from previous PR: https://github.com/pytorch/pytorch/pull/62374#discussion_r679354145 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053980 fbshipit-source-id: 79c216410282eccd6f0a8f24e38c55c4d18ec0d0	2021-08-10 12:17:36 -07:00
Raghav Kansal	6d21e36f21	LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815 ) Summary: This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for). Fixes https://github.com/pytorch/pytorch/issues/36921 Fixes https://github.com/pytorch/pytorch/issues/61929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815 Reviewed By: anjali411 Differential Revision: D30199618 Pulled By: ngimel fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9	2021-08-10 11:07:16 -07:00
Wanchao Liang	0c39cea3d2	[sharded_tensor] add default fields to ShardedTensorMetadata (#62867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62867 This add default fields for ShardedTensorMetadata, to allow easy construction and modification afterwards. ghstack-source-id: 135284133 Test Plan: ShardedTensorMetadata validity should be guarded with `init_from_local_shards` API and its tests. Reviewed By: pritamdamania87 Differential Revision: D30148481 fbshipit-source-id: 0d99f41f23dbeb4201a36109556ba23b9a6c6fb1	2021-08-10 11:00:01 -07:00
Rohan Varma	5fb79f61a8	[DDP] Dont set thread local state in reducer autograd hook. (#62996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62996 No need to set this because autograd engine already propagates TLS states. ghstack-source-id: 135438220 Test Plan: CI Reviewed By: albanD Differential Revision: D30202078 fbshipit-source-id: e5e917269a03afd7a6b8e61f28b45cdb71ac3e64	2021-08-10 10:50:16 -07:00
Pyre Bot Jr	6915bc0781	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D30222378 fbshipit-source-id: 6a0a5d210266f19de63273240a080365c9143eb0	2021-08-10 10:26:52 -07:00
Elias Ellison	ea808df25d	Test shape analysis with opinfos (#59814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814 Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603} Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D30200058 Pulled By: eellison fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803	2021-08-10 09:47:33 -07:00
Elias Ellison	7312bd953c	add ssupport for a few more opinfos in jit (#59812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59812 This is sort of a half measure: we can successfully trace through opinfos which are registered as lambdas, we just can't script them. This tests if the op is a lambda in which case bails... see the next PR to get resize_ to work, maybe this should be consolidated with that... Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200061 Pulled By: eellison fbshipit-source-id: 7e3c9b0be746b16f0f57ece49f6fbe20bf6535ec	2021-08-10 09:47:32 -07:00
Elias Ellison	9cbdc90d73	Don't substitute in symbolic shapes to shape compute graph (#59811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59811 We don't want to actually substitute in symbolic shapes, because it invalidates the partially evaluated graph for further use. Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200059 Pulled By: eellison fbshipit-source-id: 267ed97d8421fe480dec494cdf0dec9cf9ed3ba2	2021-08-10 09:47:30 -07:00
Elias Ellison	7db0bcfb40	small cleanups (#59810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59810 Rephrasings and cleanup of dead code Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200062 Pulled By: eellison fbshipit-source-id: b03e5adb928aa46bee6685667cad43333b6e6016	2021-08-10 09:47:28 -07:00
Elias Ellison	9cd990de0d	Only optimize after change (redo) (#59809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59809 Some how this didnt get landed previously in ghstack mixup Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200060 Pulled By: eellison fbshipit-source-id: 47f256421a1fe1a005cd11fcc4d7f023b5990834	2021-08-10 09:46:13 -07:00
Michael Suo	4c630773e8	[jit] warn if _check_overload_body fails to find source Summary: Under certain conditions (particularly if a module is frozen, like with PyInstaller or torch::deploy), we will not have source code available for functions. `import torch` should still work in this case, but this check is currently causing it to raise an exception. Since this is an initial check (if an overload is actually exercised there will be hard failure), raise a warning and move on. Test Plan: unit tests Reviewed By: eellison Differential Revision: D30214271 fbshipit-source-id: eb021503e416268e8585e0708d6271c1e7b91e95	2021-08-10 09:28:50 -07:00
Supriya Rao	aa89d5f7f6	[quant] Update get_default_qat_qconfig to return the fused observer+fake_quant module (#62702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62702 Expose the qconfig to the user to speed up training by leveraging the fused module. The module currently supports per-tensor/per-channel moving avg observer and fake-quantize. For details on perf benefits, refer to https://github.com/pytorch/pytorch/pull/61691 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30093719 fbshipit-source-id: b78deb7810f5b597474b9b9a0395d361d04eb46a	2021-08-10 09:28:49 -07:00
Supriya Rao	08d1a12d69	[quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863 To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max Test Plan: python test/test_quantization.py test_fused_mod_reduce_range Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30146602 fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972	2021-08-10 09:27:01 -07:00
Peter Bell	978490d7c7	Codegen: Fix operator::name on windows (#62278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62278 In `Operators.h` we're using `str(BaseOperatorName)`, while in `OperatorsEverything.cpp` we're using `str(OperatorName)`. e.g. ``` STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::abs") ``` vs ``` STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(abs_out, name, "aten::abs.out") ``` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962047 Pulled By: albanD fbshipit-source-id: 5a05b898fc734a4751c2b0187e4eeea4efb0502b	2021-08-10 07:58:09 -07:00
Edward Yang	cdf702b60c	Reject kwonly arguments passed positionally in torch.ops (#62981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62981 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D30211030 Pulled By: ezyang fbshipit-source-id: aae426592e92bf3a50076f470e153a4ae7d6f101	2021-08-10 07:16:00 -07:00
Sameer Deshmukh	9e7b6bb69f	Allow LocalResponseNorm to accept 0 dim batch sizes (#62801 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `LocalResponseNorm` to accept tensors with 0 dimensional batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62801 Reviewed By: zou3519 Differential Revision: D30165282 Pulled By: jbschlosser fbshipit-source-id: cce0b2d12dbf47dc8ed6247c267bf2f2305f858a	2021-08-10 06:54:52 -07:00
Luca Wehrstedt	061062ae2a	Update TensorPipe submodule Test Plan: CI ran as part of https://github.com/pytorch/pytorch/pull/60938. Reviewed By: beauby Differential Revision: D30219343 fbshipit-source-id: 531338f912fee488d312d23da8bda63ceb862aa9	2021-08-10 05:46:12 -07:00
Rohan Varma	3df4870343	[Reland][DDP] Support not all outputs used in loss calculation (#61753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61753 Reland of https://github.com/pytorch/pytorch/pull/57081. Main difference is that the former diff moved `prepare_for_backward` check into `DDPSink` backward, but that resulted in issues due to potential autograd engine races. The original diff moved `prepare_for_backward` into `DDPSink` as part of a long-term plan to always call it within `DDPSink`. In particular this doesn't work because `prepare_for_backward` sets `expect_autograd_hooks=true` which enables autograd hooks to fire, but there were several use cases internally where autograd hooks were called before DDPSink called `prepare_for_backward`, resulting in errors/regression. We instead keep the call to `prepare_for_backward` in the forward pass, but still run outputs through `DDPSink` when find_unused_parameters=True. As a result, outputs that are not used when computing loss have `None` gradients and we don't touch them if they are globally `None`. Note that the hooks still fire with a undefined gradient which is how we avoid the Reducer erroring out with the message that some hooks did not fire. Added the unittests that were part of the reverted diff. ghstack-source-id: 135388925 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29726179 fbshipit-source-id: 54c8819e0aa72c61554104723a5b9c936501e719	2021-08-09 22:29:11 -07:00
Ilqar Ramazanli	5ed6e4429e	To fix variance computation for complex Adam (#62946 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59998 It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain. As it has been stated in the Generalization to Complex numbers section in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)] (where mu = E[X] and stands for conjugate) for complex random variable X. However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex. We fix this issue here, and testing that resulting variance is indeed real number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946 Reviewed By: albanD Differential Revision: D30196038 Pulled By: iramazanli fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718	2021-08-09 17:54:43 -07:00
Jerry Zhang	3c1d1170a4	[quant][graphmode][fx] Attach a weight qparam dict to linear and conv in reference quantized model (#62488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62488 Instead of attaching weight observer/fake_quant to the float linear and conv, we can compute the quantization parameters and attach that as a dictionary to these modules so that we can reduce the model size and make the reference module clearer TODO: the numerics for linear and conv in reference quantized model is still not correct since we did not quantize weight, we may explore things like parameterization to implement this support Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30053979 fbshipit-source-id: b5f8497cf6cf65eec924df2d8fb10a9e154b8cab	2021-08-09 16:55:14 -07:00
zhouzhuojie	59ac451ba3	Simplify the logic of running ci workflow codegen (#62853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62853 wanted to simplify the logic in the `__post_int__`, and delegate the settings back to individual workflows, this gives us more flexibility in changing individual workflows, as well as reducing the complexity of understanding the mutation conditions. Test Plan: Imported from OSS Reviewed By: walterddr, seemethere Differential Revision: D30149190 Pulled By: zhouzhuojie fbshipit-source-id: 44df5b1e14184f3a81cb8004151525d0e0fb20d9	2021-08-09 16:47:46 -07:00
Richard Barnes	8720369a48	irange-ify 12b (#62484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62484 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30015528 fbshipit-source-id: c4e1a5425a73f100102a97dcec1579f1049c9c1d	2021-08-09 16:40:47 -07:00
Peter Bell	93e0f3a330	Shard Operators.cpp (#62185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62185 This file can take 5 minutes on its own to compile, and is the single limiting factor for compile time of `libtorch_cpu` on a 32-core threadripper. Instead, sharding into 5 files that take around 1 minute each cuts a full minute off the overall build time. This also factors out the `.findSchemaOrThrow(...).typed` step so the code can be shared between `call` and `redispatch`. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962049 Pulled By: albanD fbshipit-source-id: be5df05fbea09ada0d825855f1618c25a11abbd8	2021-08-09 16:19:49 -07:00
Richard Barnes	4b9ca72c7c	irange-ify 13d (#62477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62477 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30001499 fbshipit-source-id: 993eb2b39f332ff0ae6c663792bd04734cfc262b	2021-08-09 16:16:58 -07:00
peterjc123	d16587f84d	Enable rebuilds for Ninja on Windows (#62948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59859. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948 Reviewed By: seemethere, tktrungna Differential Revision: D30192246 Pulled By: janeyx99 fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48	2021-08-09 16:15:45 -07:00
Marjan Fariborz	a82b9ef1ff	BFP16 quantization/dequantization (#62974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62974 Testing the functionality of `tensor.to` approach. Comparing `tensor.to` and `torch.ops.fb.FloatToBfloat16Quantized` approach and testing if they match for 2d tensors. Test Plan: buck test //torchrec/fb/distributed/tests:test_quantized_comms Reviewed By: wanchaol Differential Revision: D30079121 fbshipit-source-id: 612e92baeb2245449637faa9bc31686353d67033	2021-08-09 15:47:07 -07:00
Xiang Gao	c4aeecac75	Migrate Embedding thrust sort to cub sort (#62495 ) Summary: This PR only migrates sort. Other thrust operations will be migrated in followup PRs Benchmark `num_embeddings` pulled from https://github.com/huggingface/transformers/tree/master/examples by ``` grep -P 'vocab_size.(=\|:)\s[0-9]+' -r transformers/examples/ grep -P 'hidden_size.(=\|:)\s[0-9]+' -r transformers/examples/ ``` to get `vocab_size = 119547, 50265, 32000, 8000, 3052` (similar size omitted) and `hidden_size = 512, 768` Code: ```python import torch import itertools num_embeddings = (119547, 50265, 32000, 8000, 3052) num_tokens = (4096, 16384) hidden_sizes = (512, 768) for ne, nt, nh in itertools.product(num_embeddings, num_tokens, hidden_sizes): print(f"Embedding size: {ne}, Tokens: {nt}, Hidden size: {nh}") embedding = torch.nn.Embedding(ne, nh).cuda() input_ = torch.randint(ne, (nt,), device='cuda') out = embedding(input_) torch.cuda.synchronize() %timeit out.backward(out, retain_graph=True); torch.cuda.synchronize() ``` ## On CUDA 11.3.1 Before: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.43 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.07 ms ± 56.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.61 ms ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.32 ms ± 8.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 738 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.02 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 913 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.27 ms ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 559 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 743 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 713 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 977 µs ± 884 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 301 µs ± 8.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 383 µs ± 4.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 409 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 515 µs ± 766 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 215 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 250 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 271 µs ± 888 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 325 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.42 ms ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.05 ms ± 9.93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.6 ms ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.3 ms ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 730 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.01 ms ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 887 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.25 ms ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 556 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 744 µs ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 691 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 957 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 309 µs ± 2.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 376 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 381 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 487 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 202 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 239 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 243 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 340 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ## On CUDA 11.1 Before: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.41 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.05 ms ± 7.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.61 ms ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.32 ms ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 743 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.02 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 912 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.28 ms ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 555 µs ± 2.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 743 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 714 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 980 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 312 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 386 µs ± 2.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 413 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 512 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 209 µs ± 585 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 271 µs ± 776 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 297 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 377 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.46 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.09 ms ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.64 ms ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.35 ms ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 782 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.06 ms ± 596 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 945 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.31 ms ± 553 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 603 µs ± 856 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 789 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 752 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 1.01 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 323 µs ± 7.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 398 µs ± 765 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 412 µs ± 544 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 519 µs ± 614 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 229 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 263 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 274 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 354 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62495 Reviewed By: gchanan Differential Revision: D30176833 Pulled By: ngimel fbshipit-source-id: 44148ebb53a0abfc1e5ab8b986865555bf326ad1	2021-08-09 15:31:55 -07:00
=	084e92bb76	Use output memory format based on input for cudnn_convolution_relu (#62482 ) Summary: Currently when cudnn_convolution_relu is passed a channels last Tensor it will return a contiguous Tensor. This PR changes this behavior and bases the output format on the input format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62482 Reviewed By: ngimel Differential Revision: D30049905 Pulled By: cpuhrsch fbshipit-source-id: 98521d14ee03466e7128a1912b9f754ffe10b448	2021-08-09 15:31:53 -07:00
Richard Barnes	4fdb9579fa	irange-ify 12 (#62120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879713 fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136	2021-08-09 15:31:51 -07:00
Richard Barnes	da9958c899	irange-ify 1 (#62193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62193 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879504 fbshipit-source-id: adc86adcd1e7dcdfa2d7adf4d576f081430d52ec	2021-08-09 15:30:43 -07:00
zhouzhuojie	161fb31893	Fix render_test_results if condition on always() (#62997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62997 Fixes #62979, changed the condition to listen on the previous' job's result to be either 'success' or 'failure'. Notice that 'skipped' will also skip this job, which is what we want. Test Plan: Imported from OSS Reviewed By: driazati, seemethere Differential Revision: D30202598 Pulled By: zhouzhuojie fbshipit-source-id: f3c0f715c39a5c8119b528b66e45f594a54b49d1	2021-08-09 15:27:40 -07:00
Rohan Varma	39ec1da935	[reland] Gate DistributedOptimizers on RPC availability (#62937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937 reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda. ghstack-source-id: 135306176 Test Plan: ci Reviewed By: mrshenli Differential Revision: D30177734 fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2	2021-08-09 14:41:06 -07:00
Richard Barnes	5b8389e536	irange-ify 8d (#62505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62505 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29971891 fbshipit-source-id: 7dcbe27221788695f320c7238f5fe81e32823802	2021-08-09 13:18:38 -07:00
Bradley Davis	6286d33878	[fx] store Tracer class on Graph and GraphModule for package deserialization (#62497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62497 Previously named: add support for custom tracer in __reduce_package__ Stores a Tracer class on a Graph created by Tracer, and copies the Tracer class into the GraphModule's state so that when a GraphModule is packaged by torch package, it can be reconstructed with the same Tracer and GraphModule class name. Reviewed By: suo Differential Revision: D30019214 fbshipit-source-id: eca09424ad30feb93524d481268b066ea55b892a	2021-08-09 13:07:30 -07:00
Nikita Shulga	f82d4b8957	Mark unused functions with `C10_UNUSED` (#62929 ) Summary: Which fixes number of warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/62929 Reviewed By: walterddr, albanD Differential Revision: D30171953 Pulled By: malfet fbshipit-source-id: f82475289ff4aebb0c97794114e94a24d00d2ff4	2021-08-09 13:00:33 -07:00
peterjc123	08f6bc1da6	Stop exporting symbols in anonymous namespaces (#62952 ) Summary: The cases are found out by compiling against clang on Windows. Those functions will still be exported under this case, which is a waste of space in the symbol table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952 Reviewed By: gchanan Differential Revision: D30191291 Pulled By: ezyang fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e	2021-08-09 12:52:12 -07:00
Mike Iovine	3dcd785cac	[Static Runtime] Add tests for all aten ops (#62347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62347 This diff includes tests for all `aten` ops that did not already have test coverage. Test Plan: `buck test //caffe2/benchmarks/static_runtime/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29968280 fbshipit-source-id: 768655ca535f9e37422711673168dce193de45d2	2021-08-09 12:09:59 -07:00
Zeina Migeed	a01f832329	handle get_attr opearations in typechecker (#62682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62682 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30107789 Pulled By: migeed-z fbshipit-source-id: 0b21b2893e2dc7cfaf5b5f5990f662e051a981b4	2021-08-09 11:49:04 -07:00
Bert Maher	3eeaffc7c5	Linker version script to hide LLVM symbols (#62906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62906 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30193893 Pulled By: bertmaher fbshipit-source-id: 9b189bfd8d4c52e8dc4296a4bed517ff44994ba0	2021-08-09 11:26:02 -07:00
Andrew Gu	1b1f1e36b4	Add ``allow_empty_param_list`` to functional optimizers (#62522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62522 Addresses https://github.com/pytorch/pytorch/issues/62481 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30072074 Pulled By: andwgu fbshipit-source-id: 1a5da21f9636b8d74a6b00c0f029427f0edff0e3	2021-08-09 11:18:56 -07:00
Sangbaek Park	710c419f11	[Vulkan] Added Hardshrink op (#62870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62870 Added Hardshrink operator for Vulkan Added tests for Hardshrink op Reference: [Hardshrink](https://pytorch.org/docs/stable/generated/torch.nn.Hardshrink.html#torch.nn.Hardshrink) Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30174950 Pulled By: beback4u fbshipit-source-id: 3e192390eb9f92abecae966e84bbfae356bfd7c8	2021-08-09 10:54:11 -07:00
Zeina Migeed	922710f9b9	Change output node handling for typechecker to deal with tuples (#62582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62582 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30050004 Pulled By: migeed-z fbshipit-source-id: 9b81b10d24e1e8165cdc18c820ea314349b463cb	2021-08-09 10:47:12 -07:00
Edward Yang	e55f271859	__torch_dispatch__: Populate kwargs dictionary with keyword-only arguments (#62822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62822 This is BC breaking for people who were using the old integration, although only if you had been writing bindings for functions with keyword-only arguments (that includes functorch). Other than that, the patch was pretty straightforward. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30134552 Pulled By: ezyang fbshipit-source-id: a47f536fb030994a07c9386069b8f800ac86d731	2021-08-09 10:02:54 -07:00
Jane Xu	2b83007ae2	Modify GHA CI to use PYTORCH_IGNORE_DISABLED_ISSUES based on PR body (#62851 ) Summary: Another step forward in fixing https://github.com/pytorch/pytorch/issues/62359 Disclaimer: this only works with GHA for now, as circleci would require changes in probot. Test plan can be seen a previous description where I modified the description to include linked issues. I've removed them now since the actual PR doesn't fix any of them. It works! In the [periodic 11.3 test1](https://github.com/pytorch/pytorch/pull/62851/checks?check_run_id=3263109970), we get this in the logs and we see that PYTORCH_IGNORE_DISABLED_ISSUES is properly set: ``` test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... Using /var/lib/jenkins/.cache/torch_extensions/py36_cu113 as PyTorch extensions root... Creating extension directory /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension... Detected CUDA files, patching ldflags Emitting ninja build file /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension/build.ninja... Building extension module torch_test_cuda_extension... Using envvar MAX_JOBS (30) as the number of workers... [1/3] c++ -MMD -MF cuda_extension.o.d -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (`d55b25a633`)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (`d55b25a633`)_STDLIB=\"_libstdcpp\" -DPYBIND11 (`d55b25a633`)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cpp -o cuda_extension.o [2/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (`d55b25a633`)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (`d55b25a633`)_STDLIB=\"_libstdcpp\" -DPYBIND11 (`d55b25a633`)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 --compiler-options '-fPIC' -O2 -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cu -o cuda_extension.cuda.o nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [3/3] c++ cuda_extension.o cuda_extension.cuda.o -shared -L/opt/conda/lib/python3.6/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o torch_test_cuda_extension.so Loading extension module torch_test_cuda_extension... ok (26.161s) ``` whereas on the latest master periodic 11.1 windows [test](https://github.com/pytorch/pytorch/runs/3263762478?check_suite_focus=true), we see ``` test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... skip (0.000s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62851 Reviewed By: walterddr, tktrungna Differential Revision: D30192029 Pulled By: janeyx99 fbshipit-source-id: fd2ecc59d2b2bb5c31522a630dd805070d59f584	2021-08-09 09:48:56 -07:00
Raghavan Raman	8b54b14f92	[Static Runtime] Added a cache for NNC generated code across different calls to the same ops (#62921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921 Added a cache for NNC generated code across different calls to the same ops. Before this diff: ``` ProcessedNode time 13402.9 ms Static Module initialization took 30964.8 ms ``` After this diff: ``` ProcessedNode time 85.4195 ms Static Module initialization took 4348.42 ms ``` There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model. Test Plan: Tested inline_cvr model Reviewed By: hlu1 Differential Revision: D30104017 fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7	2021-08-09 09:30:07 -07:00
Rong Rong (AI Infra)	3782f3eced	Enable upper for torch.linalg.cholesky (#62434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62434 Reviewed By: seemethere, tktrungna Differential Revision: D30079806 Pulled By: walterddr fbshipit-source-id: 044efb96525155c9bc7953ac4ad47c1b7c12fb20	2021-08-09 09:28:33 -07:00
Raghavan Raman	e54ee9bac1	[nnc] Updated IR cloning to create clones of expressions in addition to statements (#62833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62833 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30135980 Pulled By: navahgar fbshipit-source-id: e557eedec7ecf596a4045756276d25a485fa66fb	2021-08-09 09:13:03 -07:00
peter	5deeaab36a	minor fixes in c10d for Windows (#62953 ) Summary: Found out by triggering builds against clang on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62953 Reviewed By: gchanan Differential Revision: D30191300 Pulled By: ezyang fbshipit-source-id: d929119768298084c41d70dbc3a78aacd64fb715	2021-08-09 09:05:09 -07:00
Elias Ellison	fff83f3f66	Add handling of list write to remove mutation (#62904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62904 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30168493 Pulled By: eellison fbshipit-source-id: 3b25982b235938cc7439dd3a5236dfce68254c05	2021-08-09 08:56:06 -07:00
Elias Ellison	254148ec7d	Add tensor-scalar op (#62903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62903 Test Plan: Imported from OSS Reviewed By: pbelevich, SplitInfinity Differential Revision: D30168338 Pulled By: eellison fbshipit-source-id: 7dcb34ddd76c6aad4108a4073d3c8a93d974d0ef	2021-08-09 08:54:47 -07:00
Yukio Siraichi	4c4c5b14e4	Port `sum.dim_IntList` kernel to structured kernels. (#61642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61642 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29783865 Pulled By: ezyang fbshipit-source-id: 375d4cd5f915812108367601a610a428762e606d	2021-08-09 08:46:16 -07:00
Marjan Fariborz	c7db642a72	Adding collective quantization API (#62142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62142 Created wrapper that takes the collective op and a quantization type as an arguments. It quantize the input, performs the collective op, and and perform dequantization Test Plan: Tested through distributed_gloo_fork. e.g., buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_all_to_all_quantized Reviewed By: wanchaol Differential Revision: D29682812 fbshipit-source-id: 79c39105ff11270008caa9f566361452fe82a92e	2021-08-09 08:11:22 -07:00
Erjia Guan	6ccedc7c1f	Set mkl thread locally (#62891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62891 Fixes #60469 We want to land this PR before next release, so soliciting the idea from raven38 in https://github.com/pytorch/pytorch/pull/60471. And, add corresponding test to verify the result. - Before this PR using this test: ![image](https://user-images.githubusercontent.com/68879799/128542334-1b899be5-2b6e-4c03-8ac0-568fb15470b8.png) - After this PR the test passed without Error. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30161483 Pulled By: ejguan fbshipit-source-id: 800f7204e0e1a19c492b2e556c92a91115f1b69b	2021-08-09 07:37:18 -07:00
Nikita Shulga	30214aef2d	[BE] irangefy (#62928 ) Summary: Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928 Reviewed By: driazati Differential Revision: D30171904 Pulled By: malfet fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae	2021-08-07 13:34:13 -07:00
Will Constable	9f7aba737b	Make IMethod cache mutable so getArgument works on const IMethod (#62834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62834 Test Plan: existing unit tests Reviewed By: alanwaketan Differential Revision: D30135939 fbshipit-source-id: e19c0ac1af6996e065a18318351265b5c4a01e70	2021-08-06 22:58:21 -07:00
Mikhail Zolotukhin	b80dffd911	[TensorExpr] Remove more 'const' from IRVisitor methods for *Imm types. (#62932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62932 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30172961 Pulled By: ZolotukhinM fbshipit-source-id: 9b7f45880d356f823364135fe29fc08f6565f827	2021-08-06 22:44:09 -07:00
Natalia Gimelshein	b45cf9b81b	Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability Test Plan: revert-hammer Differential Revision: D30117838 (`3f09485d7e`) Original commit changeset: e6365a910a3d fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2	2021-08-06 22:10:41 -07:00
Natalia Gimelshein	e6a3154519	Allow broadcasting along non-reduction dimension for cosine similarity (#62912 ) Summary: Checks introduced by https://github.com/pytorch/pytorch/issues/58559 are too strict and disable correctly working cases that people were relying on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62912 Reviewed By: jbschlosser Differential Revision: D30165827 Pulled By: ngimel fbshipit-source-id: f9229a9fc70142fe08a42fbf2d18dae12f679646	2021-08-06 19:17:04 -07:00
Peter Bell	6630d98ae5	Refactor codegen file sharding (#62184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62184 File sharding is currently implemented twice, once for VariableType and once for TraceType. This refactors the implementation into `FileManager` and also changes it so template substitution is only done once and shared between the sharded file and the "Everything" file. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962050 Pulled By: albanD fbshipit-source-id: 7858c3ca9f6e674ad036febd2d1a4ed2323a2861	2021-08-06 19:13:42 -07:00
Rohan Varma	44fad84bca	[DDP] Add host-side time to CUDATimer (#62770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62770 Adding timing of forward, backward comp, backward comm, etc will help detect desynchronization issues. ghstack-source-id: 135195680 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30115585 fbshipit-source-id: 509bf341c5c92dcc63bdacd3c1e414da4eb4f321	2021-08-06 18:41:40 -07:00
Will Constable	22e3cc21e5	Back out "Enable test_api IMethodTest in OSS" (#62893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893 Original commit changeset: 50eb3689cf84 Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS Reviewed By: seemethere, alanwaketan Differential Revision: D30159999 fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c	2021-08-06 16:43:50 -07:00
Alex Suhan	bbe2c8e6d2	Fix reshape for the Lazy key (#62846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62846 Test Plan: CI Reviewed By: zou3519 Differential Revision: D30162185 Pulled By: asuhan fbshipit-source-id: d582dcef35ce7e8bebf161a5c93e470339891e29	2021-08-06 15:29:56 -07:00
Natalia Gimelshein	6e24ce7a46	Revert D30138788: [pytorch][PR] OpInfo for `adaptive_avg_pool2d` Test Plan: revert-hammer Differential Revision: D30138788 (`5c431981b5`) Original commit changeset: 66735ceaa85b fbshipit-source-id: 75eb241ef82d32d6480db069c035df0abc6753fe	2021-08-06 15:17:05 -07:00
Angela Yi	d9154b9b26	[quant] Input-Weight Equalization - allow logical evaluation (#61603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61603 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29686878 fbshipit-source-id: 67ca4cab98b3d592ff2bb8db86499789b85bd582	2021-08-06 15:10:32 -07:00
Eli Uriegas	43b087791c	.github: Make sure to deep clone on windows (#62907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62907 Deep clones allow us to use git commands on historical commits so that we can do things like collect test times correctly Should fix empty `.pytorch-test-times.json` files that walterddr was observing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30166414 Pulled By: seemethere fbshipit-source-id: 1f9904eeb5a8ebaf0a02d1aa7291fffe1aecd57b	2021-08-06 15:06:56 -07:00
Natalia Gimelshein	e3944ab00e	Revert D30038175: Improve IMethod::getArgumentNames to deal with empty argument names list Test Plan: revert-hammer Differential Revision: D30038175 (`64b3ab6407`) Original commit changeset: 46f08dda9418 fbshipit-source-id: 604735d2300487a0b75890b330d7ba5b3e7145b2	2021-08-06 14:58:43 -07:00
Yi Wang	7a3f1386ae	Add GradBucket::parameters() to ddp_comm_hooks.rst (#62877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62877 as title ghstack-source-id: 135214612 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30153490 fbshipit-source-id: d4cec434a53ef6e65b60c065804884d1a114aa0d	2021-08-06 14:50:47 -07:00
eqy	6d24a075cb	Check contiguous to dispatch to NHWC cuda template (#62839 ) Summary: follow up of https://github.com/pytorch/pytorch/issues/62773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62839 Reviewed By: H-Huang Differential Revision: D30142906 Pulled By: ngimel fbshipit-source-id: 600a7ad240a4a1827352eab8c8cbc98240d693f0	2021-08-06 14:11:10 -07:00
=	e6e579ce74	[FX] Add torch.memory_format as a BaseArgumentType (#62593 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62593 Reviewed By: H-Huang Differential Revision: D30104091 Pulled By: cpuhrsch fbshipit-source-id: 25b7a4b308219860c969db54d7b1867b1aa4180a	2021-08-06 14:03:41 -07:00
Rong Rong (AI Infra)	97dc43beeb	use test environment for test phase (#62824 ) Summary: Currently all test generated in test matrix share the same `BUILD_ENVIRONMENT` variable. we should distinguish them because some test scripts uses BUILD_ENVIRONMENT to differentiate what to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62824 Reviewed By: zhouzhuojie Differential Revision: D30162250 Pulled By: walterddr fbshipit-source-id: 3a99a21e91e02ed8638feed102e7966af01dd175	2021-08-06 11:52:41 -07:00
Jane Xu	786934902c	Adds JOB_BASE_NAME to steps of CircleCI mac workflows (#62892 ) Summary: Upon noticing that we had a job entry named "None" in our S3 stats, I set out to find which test reporting had a JOB_BASE_NAME that wasn't set. It turns out all non Windows and Linux workflows did not have JOB_BASE_NAME but instead used CIRCLE_JOB. This remedies the current issue by explicitly setting JOB_BASE_NAME in Mac workflows, but doesn't touch anything else as those other jobs (like android) do not report test stats. This also adds back the CIRCLE_JOB dependency in print_test_stats to be backwards compatible, but the goal is to move off of CIRCLE_JOB dependency to a more CI-platform-agnostic naming of variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62892 Test Plan: Imported from GitHub, without a `Test Plan:` line. {F639556801} None is now the macos! Reviewed By: walterddr Differential Revision: D30160234 Pulled By: janeyx99 fbshipit-source-id: df868dec5f9b289d3837e927d2bb95acb2d9185b	2021-08-06 11:34:17 -07:00
Rong Rong (AI Infra)	c9b5d79d40	[hotfix] fix BC checker direction (#62901 ) Summary: fix https://github.com/pytorch/pytorch/issues/62687 error. should allow listed those that has date time newer than today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62901 Reviewed By: zhouzhuojie Differential Revision: D30163202 Pulled By: walterddr fbshipit-source-id: b882975a231249137cb2d252f41e98e133b6f337	2021-08-06 11:29:28 -07:00
Thomas J. Fan	59d09b148c	BUG Fixes bug in no_batch_dim tests (#62726 ) Summary: The way that Python captures variables for lambdas meant that only the last `input_fn`, etc were captured. This PR adds makes sure the local variable to captured by a lambda. REF: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result Pull Request resolved: https://github.com/pytorch/pytorch/pull/62726 Reviewed By: zou3519 Differential Revision: D30159478 Pulled By: jbschlosser fbshipit-source-id: cfef3d9776d2676b2f5bb6d39d569b8ca07b0fe5	2021-08-06 11:11:25 -07:00
Jane Xu	a03604c610	Set JOB_BASE_NAME consistently for bazel (#62886 ) Summary: It was manually set incorrectly before to pytorch-linux-xenial-py3.6-gcc7-bazel-test-test, which is inconsistent with the rest of our naming scheme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62886 Reviewed By: driazati Differential Revision: D30159860 Pulled By: janeyx99 fbshipit-source-id: 4984ec04ee2bcf68b9a57e241ca9f979bfe6398a	2021-08-06 11:07:03 -07:00
Rohan Varma	3f09485d7e	[WIP] Gate DistributedOptimizers on RPC availability (#62774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774 Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well. ghstack-source-id: 135216642 Test Plan: CI Reviewed By: pbelevich Differential Revision: D30117838 fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db	2021-08-06 10:59:00 -07:00
Rohan Varma	1dba329d20	Enable step_param for Adam functional optimizer (#62611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611 Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs. 1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param` 2. Modify tests to test all current functional optimizers. ghstack-source-id: 135207143 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29891783 fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1	2021-08-06 10:53:55 -07:00
Angela Yi	836b2431dc	[quant] Input-Weight Equalization - selective equalization (#61916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61916 Functions used to run selective equalization based on the SQNR obtained from running the Numeric Suite. After running the Numeric Suite between the equalized and float model, we will get the SQNR between the two models and construct an equalization_qconfig_dict that specifies to only equalize the layers with the highest quantization errors. How to run: ``` layer_to_sqnr_dict = get_layer_sqnr_dict(float_model, equalized_model, input) eq_qconfig_dict = get_equalization_qconfig_dict(layer_to_sqnr_dict, equalized_model, num_layers_to_equalize) prepared = prepare_fx(float_model, qconfig_dict, eq_qconfig_dict) ... ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_selective_equalization` Imported from OSS Reviewed By: supriyar Differential Revision: D29796950 fbshipit-source-id: 91f0f8427d751beaea32d8ffc2f3b8aa8ef7ea95	2021-08-06 09:29:03 -07:00
Yusuo Hu	e6ef87001c	[BF16] Add BF16 support to _aminmax and _anminmax_all operators (#62767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62767 Add BF16 support to _aminmax_all and _aminmax operators. Test Plan: Added unit test: https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274857208373/ Reviewed By: anjali411 Differential Revision: D30073837 fbshipit-source-id: 9cb4991e644cfdb2f0674ccaff161d223c174150	2021-08-06 08:56:12 -07:00
Stephen Jia	56ff996386	[vulkan] Add _reshape_alias (#62858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62858 D29792126 (`adb73d3dcf`) changed the behaviour of `reshape()` such that it calls `_reshape_alias()` instead of `view()` in order to avoid duplicating some work such as computing strides. Vulkan has not yet implemented `_reshape_alias()` so `reshape()` would fail with ``` C++ exception with description "Could not run 'aten::_reshape_alias' with arguments from the 'Vulkan' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. ``` For Vulkan there is no concept of strides so it's fine to just have `_reshape_alias()` point to `view()`. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: kimishpatel Differential Revision: D30054706 fbshipit-source-id: 770979fa3a0f99bcc2ddaefa4674e5bd79b17c03	2021-08-06 08:44:15 -07:00
Stephen Jia	5f4207eb91	[vulkan] Throw an exception if device does not support Vulkan (#62859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62859 If the Vulkan instance cannot be initialized successfully (i.e. no `vkPhysicalDevice` could be found due to missing drivers) then Vulkan ops will not be able to execute. However, currently `api::context()` which is used to access the global Vulkan context simply returns a null pointer if there is a problem initializing the Vulkan instance. This leads to Segmentation Faults later on because Vulkan ops assume that `api::context()` will not return a `nullptr`. For instance: [this line](https://www.internalfb.com/code/fbsource/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Persistent.cpp?lines=14) will frequently cause a Segmentation Fault when drivers are not present. Instead of having `api::context()` returning a nullptr when Vulkan cannot be initialized, it should just throw an exception since ops cannot be executed anyway. This results in a more graceful failure as these exceptions can be caught instead of crashing the app with a Seg Fault down the line. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` On an Omni model portal, I can also remove the vulkan drivers in order to test the functionality when Vulkan is not supported. Reviewed By: kimishpatel Differential Revision: D30139891 fbshipit-source-id: 47fcc8dcd219cb78ab9bec0b6a85b2aa7320ab50	2021-08-06 08:42:26 -07:00
Vitaly Fedyunin	d3bdf345cb	Introducing DataChunk for DataPipes batching (#62768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62768 This is part of TorchArrow DF support preparation, separating it to multiple PRs to simplify review process. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30149090 Pulled By: VitalyFedyunin fbshipit-source-id: a36b5ff56e2ac6b06060014d4cd41b487754acb8	2021-08-06 08:38:33 -07:00
Edward Yang	5e5de75f4d	Add getPyInterpreter() API (#62659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62659 It turns out that it is occasionally useful to be able to access the PyInterpreter object from other Python bindings (see next diff in the stack). Make it publicly available. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30074926 Pulled By: ezyang fbshipit-source-id: 2f745ab7c7a672ed7215231fdf9eef6af9705511	2021-08-06 08:23:24 -07:00
Eugene Yang	27135f86fd	fix docstring default value of `last_epoch` for SWALR in torch/optim/… (#62799 ) Summary: …swa_utils Fixes https://github.com/pytorch/pytorch/issues/62633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62799 Reviewed By: zou3519 Differential Revision: D30131929 Pulled By: H-Huang fbshipit-source-id: 741c077073bbe398492dff0761836acdbba7be78	2021-08-06 08:15:10 -07:00
Michael Shang	9573e7a644	rename namespace f4d to velox (#61 ) Summary: Pull Request resolved: https://github.com/facebookexternal/torchdata/pull/61 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62860 Pull Request resolved: https://github.com/facebookexternal/presto_cpp/pull/453 Moving all namespace definitions, declarations and references from 'f4d' to 'velox' Test Plan: ``` buck build //f4d/... buck test //f4d/... ``` Also monitor the signals from sandcaslte Reviewed By: pedroerp Differential Revision: D30140136 fbshipit-source-id: 5b53ac768bb7e5cd07c93a9b04dfd6363080eb52	2021-08-05 21:04:36 -07:00
Aliaksandr Ivanou	e1f81c9321	[torchelastic][multiprocessing] Print warning message only when child processes are stuck (#62823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62823 The diff makes sure that the warning message is printed only when the child processes are stuck after sending termination code. Test Plan: sandcastle buck build mode/dev-nosan //caffe2:run buck-out/gen/caffe2/run.par --nnodes 1 --nproc_per_node 1 main.py P435691445 Differential Revision: D30046695 fbshipit-source-id: c59170b297f4a0e530906fa5069234303deee938	2021-08-05 19:57:31 -07:00
Sameer Deshmukh	f6c7081a16	Allow FractionalMaxPool 2D and 3D layers to accept 0 dim batch size tensors. (#62083 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. Allow `FractionalMaxPool` 2D and 3D layers to accept 0 dim batch sizes. Also make some minor corrections to error messages to make them more informative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62083 Reviewed By: H-Huang Differential Revision: D30134461 Pulled By: jbschlosser fbshipit-source-id: 0ec50875d36c2083a7f06d9ca6a110fb3ec4f2e2	2021-08-05 17:40:10 -07:00
Andrew Gu	8aa12cbf86	Add tutorial link (#62785 ) Summary: Addresses: https://github.com/pytorch/pytorch/pull/62605#discussion_r681380364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62785 Test Plan: I checked the render, and the link redirects as desired. Reviewed By: mrshenli Differential Revision: D30133229 Pulled By: andwgu fbshipit-source-id: baefe0d1f1b78ece44bb42e67629bc130dbf8e9a	2021-08-05 17:28:02 -07:00
kshitij12345	64c54f92ca	[opinfo] nn.functional.unfold (#62705 ) Summary: Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62705 Reviewed By: H-Huang Differential Revision: D30138807 Pulled By: zou3519 fbshipit-source-id: 1d0b0e58feb13aec7b231c9f632a6d1694b9d272	2021-08-05 17:12:25 -07:00
Rohan Varma	9ac56ef0fc	[DDP] log gradient ready order and bucket indices (#62751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62751 This will help us determine whether gradient ready order and bucket indices are aligned amongst all the ranks. This should always be true for rank 0 as we determine rebuilt bucket order by the gradient ready order on rank 0, but would be interested to see this on different workloads for other ranks ghstack-source-id: 135104369 Test Plan: CI Reviewed By: SciPioneer, wanchaol Differential Revision: D30111833 fbshipit-source-id: a0ab38413a45022d953da76384800bee53cbcf9f	2021-08-05 16:36:25 -07:00
Rohan Varma	80091cb0f7	[DDP] Allow tuning of first bucket (#62748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62748 Previously after buckets were rebuilt the first bucket size was always defaulted to 1MB, this diff allows first bucket to be tuned like the rest of the bucket sizes can. Setting `dist._DEFAULT_FIRST_BUCKET_BYTES = 1` results in the following logs as expected: I0804 12:31:47.592272 246736 reducer.cpp:1694] 3 buckets rebuilt with size limits: 1, 1048, 1048 bytes. ghstack-source-id: 135074696 Test Plan: CI Reviewed By: SciPioneer, wanchaol Differential Revision: D30110041 fbshipit-source-id: 96f76bec012de129d1645e7f50e266d4b255ec66	2021-08-05 16:35:07 -07:00
Kushashwa Ravi Shrimali	5c431981b5	OpInfo for `adaptive_avg_pool2d` (#62704 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note regarding sample inputs for this function: * Checks added for all relevant/interesting cases for `output_size`: `(None, None), (None, width), (height, None), (height, width)`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62704 Reviewed By: H-Huang Differential Revision: D30138788 Pulled By: zou3519 fbshipit-source-id: 66735ceaa85b9e6050d4ec27749fc3a8108cf557	2021-08-05 16:11:31 -07:00
neginraoof	eaaceea8d4	Bump protobuf version in CircleCI docker images (#62441 ) Summary: Needed to update ONNX to 1.10 (https://github.com/pytorch/pytorch/issues/62039) because that introduces uses of the "reserved" protobuf feature. Also: * Remove protobuf install code from scripts where it was unused. * Add `-j` flag to make commands to speed things up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62441 Reviewed By: soulitzer Differential Revision: D30072381 Pulled By: malfet fbshipit-source-id: f55a4597baf95e3ed8ed987d6874388cab3426b0	2021-08-05 15:46:12 -07:00
Zhengxu Chen	e62189ad69	[jit] Better checking for overload function declarations. (#59956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59956 Issue #50175. Basically two things need to be checked and are lacking currently: 1. Overload declarations should always have a single `pass` statement as the body. 2. There should be always an implementation provided for decls which doesn't have the torch.jit._overload decorator. So in this case we need to check whether we are actually compiling a function body with decorator ahead. Test Plan: python test/test_jit.py TestScript.test_function_overloads Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29106555 fbshipit-source-id: 2d9d7df2fb51ab6db0e1b726f9644e4cfbf733d6	2021-08-05 14:21:48 -07:00
Will Constable	63fa53d37a	Add batched model to torchdeploy examples (#62836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62836 Used for upcoming diff that adds support for batching to torchdeploy Test Plan: Models are used by later diffs, but generation script is verified by CI now and locally. Reviewed By: gunchu Differential Revision: D30135938 fbshipit-source-id: 566a32a3ede56833e41712025e9d47191dfc5f39	2021-08-05 14:01:40 -07:00
mattip	c8eda919a4	test, fix sparse * dense exceptions and corner case (#61723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59916 This fixes two problems with sparse multiplication - 0d-dense * sparse was creating a non-sparse output and failing. - dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message <details> <summary> unhelpful error message </summary> Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode]. SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel] SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel] SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel] SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel] UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] </details> Also added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723 Reviewed By: ezyang Differential Revision: D29962639 Pulled By: cpuhrsch fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06	2021-08-05 11:27:12 -07:00
Peter Lin	8d7786ada6	Simplify hardswish ONNX export graph. (#60080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60080 Reviewed By: suo Differential Revision: D30002939 Pulled By: SplitInfinity fbshipit-source-id: 8b4ca6f62d51b72e9d86534592e3c82ed6608c9d	2021-08-05 11:15:14 -07:00
Philip Meier	7630f407cc	add `OpInfo` for `torch.nn.functional.grid_sample` (#62311 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62311 Reviewed By: malfet Differential Revision: D30013388 Pulled By: zou3519 fbshipit-source-id: 0887ae9935923d928bfeb59054afe1aab954b40b	2021-08-05 10:43:54 -07:00
Kushashwa Ravi Shrimali	5dbcd5638b	OpInfo for `nn.functional.avg_pool2d` (#62455 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62455 Reviewed By: soulitzer Differential Revision: D30096146 Pulled By: heitorschueroff fbshipit-source-id: ef09abee9baa5a9aab403201226d1d9db5af100a	2021-08-05 10:28:52 -07:00
Eddie Yan	878943c64f	Preserve memory layout when aten batchnorm is used (#62773 ) Summary: https://github.com/pytorch/pytorch/issues/62594 CC cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/62773 Reviewed By: H-Huang Differential Revision: D30118658 Pulled By: cpuhrsch fbshipit-source-id: bce9e92f5f8710c876a33cccbd1625155496ddea	2021-08-05 10:21:44 -07:00
Karen Zhou	d45291613c	[pruner] generalize bias hook for conv2d (#62430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62430 The bias hook is a forward hook that is part of the pruning parametrization; it is attached after the activation reconstruction forward hook, so adding the bias occurs after zeros are reinserted to the pruned activation. This diff/PR amends the bias hook to work for Conv2d layers, in addition to Linear layers. The reshaping of the ._bias parameter ensures that it is added to the right dimension of the output. ghstack-source-id: 135097700 Test Plan: Added tests for `Conv2dB()`, a model with Conv2d layers that have `bias=True`. `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MfgL Reviewed By: jerryzh168 Differential Revision: D29979571 fbshipit-source-id: c1a7e9fabc8b3c9d0050bd6b6c6a631ddfdf2a68	2021-08-05 09:27:17 -07:00
Vasiliy Kuznetsov	b524a1101a	ns for fx: add ref_node_target_type (#62685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62685 Adds a `ref_node_target_type` field to hold the string type of the base node. This is needed because in some cases the previous node does not match ref_node (if we have observers, or if we are logging inputs), and it is useful to know the type of ref_node. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D30082947 fbshipit-source-id: 98ded7b25a5d8d5ea820e0ef62c3799b65c3fc77	2021-08-05 09:26:10 -07:00
Jane Xu	b96acb7591	Allow disabled tests to be re-enabled with IGNORE_DISABLED_ISSUES (#62686 ) Summary: Part 1 of fixing https://github.com/pytorch/pytorch/issues/62359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62686 Test Plan: 1. Check out this PR and run `python setup.py install`. 2. The test we will be running requires CUDA. If you don't have CUDA, you can try this on another device or simply comment out the skipIf statement before the `test_jit_cuda_extension` test in `test_cpp_extensions_jit.py` 3. Run: `IN_CI=1 python test/run_test.py -i test_cpp_extensions_jit -- -k test_jit_cuda_extension` and notice that it should skip. If it doesn't skip, edit test/.pytorch-disabled-tests.json: modify the platforms list of the first issue (61655) to include whatever platform you are on (macos or linux), and just run `python test/test_cpp_extensions_jit.py -v -k test_jit_cuda_extension --import-disabled-tests` to make sure it skips. 4. Now `export PYTORCH_IGNORE_DISABLED_ISSUES=61655` or `export PYTORCH_IGNORE_DISABLED_ISSUES=34952,61655`. 5. `rm test/.pytorch-*` to clear the cached files. 6. Run the same command as in step 5 and note that it SHOULDN'T skip. It should run. Reviewed By: walterddr, samestep Differential Revision: D30108773 Pulled By: janeyx99 fbshipit-source-id: dbf015a266f57577dc9283b0cdff720083b5c0cb	2021-08-05 09:05:40 -07:00
Nikita Shulga	24a2681358	Revert D30094460: [profiler] Re-enable test on Windows Test Plan: revert-hammer Differential Revision: D30094460 (`5a1017be97`) Original commit changeset: 80521f6bc136 fbshipit-source-id: 7c01493ad078be7df1bbb81c08be6364d6ffaa4d	2021-08-05 08:34:15 -07:00
Pavel Belevich	0c8ed042f2	Revert D30095246: [pytorch][PR] Enable ncclAvg for reductions Test Plan: revert-hammer Differential Revision: D30095246 (`a749180e4e`) Original commit changeset: d3a3475345fa fbshipit-source-id: 34b5100b925859461296cae5a717a70e5eca6af6	2021-08-05 07:56:40 -07:00
cpatru	6d896cb545	Update faq.rst so OOM section mentions checkpoint (#62709 ) Summary: This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training. It's a known problem with a known solution: to trade-off compute for memory via checkpointing. FAQ should mention it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709 Reviewed By: nairbv Differential Revision: D30103326 Pulled By: ezyang fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9	2021-08-05 07:40:08 -07:00
Edward Yang	b84885cc8b	Add support for boxed functors (#62658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62658 Boxed functors, like their unboxed brethren, support operators which aren't just a function pointer, but a function pointer with some associated global state that is allocated at registration time. The use case I have in mind with this implementation is "dispatcher API from Python", where the extra state kernel registrations need is the PyObject callable we will invoke to do the actual invocation. See next PR in this stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D30074925 Pulled By: ezyang fbshipit-source-id: ee040edbbec1e607486d338d1ea78bb5c6b2ece9	2021-08-05 07:26:09 -07:00
Alban Desmaison	e6a227465b	Add serialization support for slots and subclass getstate/setstate (#62745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62745 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30113112 Pulled By: albanD fbshipit-source-id: 6c562d0c060fb0280e5e3d432bb42fb833e6d500	2021-08-05 06:49:44 -07:00
Alban Desmaison	056b147e10	clean torch_function handling in serialization (#62744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62744 The `Tensor._reduce_ex_internal` function can only be called via the `Tensor.__reduce_ex__` function. And that second function already properly handles the `__torch_function__` overwrites. So no need to handle them again in `Tensor._reduce_ex_internal`. This PR also updates `Tensor.__reduce_ex__` to use the specialized unary API for `__torch_function__` that makes it nicer to read. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30113113 Pulled By: albanD fbshipit-source-id: c94f5d2597ee3afe799d9de991f75615c3c172d6	2021-08-05 06:48:26 -07:00
Sean Lawlor	ee82e7a14e	[DDP Communication Hook] Renaming C++ calls to match python API closer (#62735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62735 Renamed the following 1. getTensor -> getBuffer 2. getTensorRef -> getBufferRef 3. setTensor -> setBuffer and all associated private variables as well Reviewed By: SciPioneer Differential Revision: D30069124 fbshipit-source-id: fa8f1f8a7f3255e6242973bc37b3f7b2731af55d	2021-08-05 05:06:29 -07:00
Jiewen Tan	64b3ab6407	Improve IMethod::getArgumentNames to deal with empty argument names list (#62782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62782 This diff improved IMethod::getArgumentNames to deal with empty argument names list. Test Plan: buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode Reviewed By: wconstab Differential Revision: D30038175 fbshipit-source-id: 46f08dda94187160b4d6ee87600d1b46fe934222	2021-08-05 01:32:00 -07:00
Dhruv Matani	019048b3b6	[PyTorch Edge] Simplify Exception Handling (Take-2) (module.cpp) (#62634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62634 Apply the same set of changes as in D27688352 (`d728491fc1`) to `module.cpp` as instructed by xcheng16. Basically, this simplifies exception handling and allows propagation of the original message undisturbed to the caller so that we can figure out the lineage of the exception in crash tasks such as t96812652 ghstack-source-id: 134877012 Test Plan: Build/Sandcastle Reviewed By: raziel Differential Revision: D30038867 fbshipit-source-id: 8dfd415c510bcd0ab49814f4eb559ec6fc8f72e5	2021-08-04 23:25:30 -07:00
Jiewen Tan	4b68801c69	Enable test_api IMethodTest in OSS (#62521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command. Reviewed By: ezyang Differential Revision: D30055372 Pulled By: alanwaketan fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508	2021-08-04 21:14:20 -07:00
Michael Carilli	a749180e4e	Enable ncclAvg for reductions (#62303 ) Summary: [ncclAvg](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html?highlight=ncclavg#c.ncclAvg) is a new `ncclRedOpt_t` that fuses a div-by-world-size with ncclAllReduce, Reduce, or ReduceScatter. This PR adds support. This PR and https://github.com/pytorch/pytorch/pull/62140 lay the foundation for to DDP allreduce+average grad tensors in place with a single nccl call without additional memory pass(es) to flatten or average or unflatten. I'll write the necessary DDP changes once this PR and https://github.com/pytorch/pytorch/pull/62140 land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62303 Reviewed By: soulitzer Differential Revision: D30095246 Pulled By: rohan-varma fbshipit-source-id: d3a3475345fafb0ab265c11d36db74d7c5613a0a	2021-08-04 19:43:50 -07:00
Zeina Migeed	4bd54cebe0	Refinement types and unification for symbolic shape inference (#61776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61776 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29772537 Pulled By: migeed-z fbshipit-source-id: 3555d43152a213087c64faa326432f1628eb3bb1	2021-08-04 17:34:29 -07:00
Hao Lu	a27a0b1ef5	[SR] Disable NNC temporarily (#62746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62746 Disable NNC temporarily until a code cache is implemented to reduce the compilation time. Reviewed By: ajyu Differential Revision: D30080326 fbshipit-source-id: ef8bb3ac3a6947614f4a03a3d52774b6933d3ea8	2021-08-04 17:33:07 -07:00
Nikita Shulga	afc1d1b3d6	Fix lint errors in cuda_ReportMemoryUsage tests (#62778 ) Summary: Introduced in `8bbcef5096` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62778 Reviewed By: chaekit, driazati Differential Revision: D30120245 Pulled By: malfet fbshipit-source-id: 2cb5755b870182dd147a6685c74f7defcc10030a	2021-08-04 17:26:23 -07:00
Matti Picus	658540f43f	remove deprecated is_deterministic and set_deterministic (#62158 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62158 Reviewed By: mruberry Differential Revision: D29909634 Pulled By: ezyang fbshipit-source-id: ccffbcf8f378e39bd2c7fbeace7ed1cbbe003981	2021-08-04 16:45:23 -07:00
Kushashwa Ravi Shrimali	a705b8f08f	OpInfo for `nn.functional.relu` (#62076 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/62076 Reviewed By: soulitzer Differential Revision: D30013262 Pulled By: zou3519 fbshipit-source-id: 7df5e930d1588146e09cf58c53c8860392da7348	2021-08-04 15:50:18 -07:00
Yukio Siraichi	123be6b261	Port `addcdiv` to structured kernels. (#62319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62319 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29961996 Pulled By: bdhirsh fbshipit-source-id: d38141476b41dbfd4bf029d631f81a32aff82a5e	2021-08-04 15:35:25 -07:00
Yukio Siraichi	693b0af996	Port `addcmul` kernels to structured kernels. (#62318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62318 Tracking issue: #55070 This PR introduces the method `TensorIteratorBase::build_ternary_op` for building a `TensorIteratorBase` for 3-input 1-output kernel. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29961997 Pulled By: bdhirsh fbshipit-source-id: 2208d24823bad6e74c8d508f363716d8125b8619	2021-08-04 15:34:01 -07:00
Han Guangyun	8bbcef5096	Report more information for memory profiling (#61282 ) Summary: Report pointed memory size, total allocated memory, total reserved size all in one report. `ptr` and `alloc_size` will be used for associating with op trace. `allocated_size`, `reserved_size` will be used for memory trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282 Reviewed By: ejguan Differential Revision: D29796282 Pulled By: chaekit fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87	2021-08-04 15:03:14 -07:00
CodemodService FBSourceClangFormatLinterBot	0aee9c0ef8	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30097148 fbshipit-source-id: 514c22ea52f048bb048a53fa6b5ea57f3ac12250	2021-08-04 14:58:29 -07:00
Will Constable	aed01a991d	Add hasattr to torch::deploy interface and hasMethod to PredictorContainer (#62669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62669 Useful to avoid having to implement null checking on the application side. Test Plan: Add unit tests Reviewed By: suo, houseroad Differential Revision: D30074406 fbshipit-source-id: 881aec735953b43cb24786c1a2d79e8e724928b8	2021-08-04 14:48:34 -07:00
Qing Hu	281737ea6f	[DDP Communication Hook] Rename 4 Methods of GradBucket Class Summary: 1. getPerParameterTensors -> getGradients 2. getModelParamsForBucket -> getParameters 3. isTheLastBucketToAllreduce -> IsLast Test Plan: Test results for "buck test mode/dev-nosan caffe2/test/distributed:c10d": https://pxl.cl/1Mrq8 Test results for "buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork": https://pxl.cl/1MrtP Reviewed By: SciPioneer Differential Revision: D30076436 fbshipit-source-id: 0bd1e410186a318ea6328f4c1e830ea5632f8a47	2021-08-04 14:37:23 -07:00
Rong Rong (AI Infra)	7f1b672b7a	Revert D29952381: [Static Runtime] Ensure that unittests only use out variants or native ops Test Plan: revert-hammer Differential Revision: D29952381 (`8737e17af2`) Original commit changeset: e60e70b80ccf fbshipit-source-id: 59dc2f920b7ceaf94ba8f5f36024e7cc710f6645	2021-08-04 14:25:11 -07:00
Eli Uriegas	491d89da1b	.github: Fix --no-build-suffix (#62739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62739 Original flag didn't initially work correctly so this makes it actually output the right thing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30107694 Pulled By: seemethere fbshipit-source-id: 5ff28d6820b9cf7145dbb617b86a941bf7686b5c	2021-08-04 14:19:38 -07:00
Kyle Matoba	de94034328	Fixes #62636 (#62670 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62636. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62670 Reviewed By: ezyang Differential Revision: D30102179 Pulled By: soulitzer fbshipit-source-id: 38480463ef354f2c12ed83e6678aed26b0b96efe	2021-08-04 13:58:21 -07:00
Nikita Vedeneev	8e35df0bf3	det_backward: return svd path for double backward (so that all ci tests pass) (#62570 ) Summary: Potentially fixes https://github.com/pytorch/pytorch/issues/62327 and fixes https://github.com/pytorch/pytorch/issues/62328. This PR replaces the double backward of det from eig to svd. The latter is slower but should be more stable. CC anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62570 Reviewed By: pbelevich Differential Revision: D30072876 Pulled By: anjali411 fbshipit-source-id: c91b507dbfd6a3ec47dc6d0b0dcfa5f8c8228c30	2021-08-04 13:43:51 -07:00
kshitij12345	6f0abba04c	[fix] manual_seed{_all}: mem leak (#62534 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/55768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62534 Reviewed By: nairbv Differential Revision: D30103294 Pulled By: ezyang fbshipit-source-id: d871ae869314dfd2d27544a51107ab752abfe452	2021-08-04 13:03:12 -07:00
aeioaeu	89f898ebb5	Fix wrong command in README.md (#62472 ) Summary: If it is `[15^,16^)`, 16.10 is not included. https://github.com/Microsoft/vswhere/wiki/Examples Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62472 Reviewed By: nairbv Differential Revision: D30103199 Pulled By: ezyang fbshipit-source-id: 82085627ca53cd5a4e666848d27d4ab062de8352	2021-08-04 12:55:18 -07:00
Karol Sputo	b454275f47	Support eager mode use of `torch.jit.isinstance` with multiple types (#60465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60465 Reviewed By: soulitzer Differential Revision: D30093110 Pulled By: ansley fbshipit-source-id: ee9c654bdb031e9eff4837f9f1d489c81e47cc06	2021-08-04 12:45:24 -07:00
Ilia Cherniavskii	5a1017be97	[profiler] Re-enable test on Windows (#62703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62703 Re-enable test on Windows Test Plan: CI Reviewed By: ezyang Differential Revision: D30094460 Pulled By: ilia-cher fbshipit-source-id: 80521f6bc1365d2c252f20b5d0485fc062c8d9c3	2021-08-04 12:32:24 -07:00
Don Jang	8737e17af2	[Static Runtime] Ensure that unittests only use out variants or native ops (#62335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335 This change ensures that unittests only use out variants or native ops. - Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that. - We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that. - `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op = true` since they are written to use interpreter ops by design. Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully. Reviewed By: mikeiovine, hlu1 Differential Revision: D29952381 fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702	2021-08-04 11:37:15 -07:00
Rong Rong (AI Infra)	de77c6a0eb	[BE] fix bc check (#62687 ) Summary: a bug was discovered in https://github.com/pytorch/pytorch/issues/62434, for some reason comparing the schema name didn't match the allow_list item. So: 1. remove duplicate regex compile 2. make use of the schema string is used instead of just the name Pull Request resolved: https://github.com/pytorch/pytorch/pull/62687 Reviewed By: ezyang Differential Revision: D30102437 Pulled By: walterddr fbshipit-source-id: 541b2ed77948f24daebb08623cadabb034a241e0	2021-08-04 11:00:22 -07:00
Jane Xu	0a66416767	Rename master to main for test-infra references (#62728 ) Summary: Reacting to the main->master switch in test-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/62728 Reviewed By: samestep Differential Revision: D30104777 Pulled By: janeyx99 fbshipit-source-id: a7af7dfc69fd6e02c30ad6c15808a5b32a68c587	2021-08-04 10:45:47 -07:00
Facebook Community Bot	90ba71f841	Automated submodule update: FBGEMM (#62688 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `10ec0d3388` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62688 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D30088109 fbshipit-source-id: da8a1e6232e489eac0384faadb71c2dfac5927f7	2021-08-04 10:40:50 -07:00
Jagadish Krishnamoorthy	8bcf01631a	[ROCm] update magma (#62502 ) Summary: Update magma to point to magma_ctrl_launch_bounds branch. When upstream magma branch is used, cholesky tests in test_ops.py and test_linalg.py fails due to "Intel MKL ERROR: Parameter 4 was incorrect on entry to DPOTRF." Suspect commit: [35325212b15c5baadd7493d61b19b2db2635cb68](`35325212b1`) in magma master. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/62502 Reviewed By: malfet Differential Revision: D30089171 Pulled By: seemethere fbshipit-source-id: b07234ce66d48e3af113640995f923ee586b3cd9	2021-08-04 10:19:55 -07:00
Rong Rong (AI Infra)	dfdc3069e7	Revert D30072994: [pytorch][PR] [6/n Update test rpc path Test Plan: revert-hammer Differential Revision: D30072994 (`ad4e1f1132`) Original commit changeset: 3217e764bd85 fbshipit-source-id: cf89df78a4e04ef03b04ec3c253c5cbb1a1f5f63	2021-08-04 10:14:31 -07:00
Sean Lawlor	34c9f5a8da	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662 Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface. Reviewed By: SciPioneer Differential Revision: D30012869 fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482	2021-08-04 09:27:31 -07:00
Kevin Tse	4b47ea9446	adding a skip for ROCm for a flaky test (#62664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62664 Skipping a test for ROCm because of issue #62602 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30079534 Pulled By: NivekT fbshipit-source-id: a9cf35e5d3a8d218edc9c5a704d1f9599d2f38a6	2021-08-04 07:29:06 -07:00
Nikita Shulga	d1c85d2c06	Move ASAN tests to clang-7 (#62663 ) Summary: This should avoid following false positives: ``` [ RUN ] ProtoTest.Basic /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15: runtime error: member call on address 0x7fffffffdd80 which does not point to an object of type 'google::protobuf::MessageLite' 0x7fffffffdd80: note: object is of type 'onnx_torch::ModelProto' 00 00 00 00 b0 b9 05 ef ff 7f 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'onnx_torch::ModelProto' UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15 in ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62663 Reviewed By: tktrungna Differential Revision: D30076315 Pulled By: malfet fbshipit-source-id: 7bfc2c4b417307195e3c3379e4874eaceb4f3134	2021-08-04 06:26:03 -07:00
Ilia Cherniavskii	773a8eede4	[profiler][refactor] Refactor the usage of legacy profiler implementation (#61931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931 This PR consolidates the profiling code around a new C++ implementation (profiler_kineto.h/cpp) and uses it unconditionally from torch.autograd.profiler/torch.profiler: 1. Always use profiler_kineto.h/cpp as the C++ implementation 2. Simplify profiler.py to remove unneeded parts depending on legacy impl 3. Move some of the legacy logic into profiler_legacy.py (to be fully deleted later) Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py -v USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py -v Imported from OSS Reviewed By: gdankel Differential Revision: D29801599 fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e	2021-08-03 18:51:29 -07:00
Victor Quach	5830f122f1	Add docstrings for save_on_cpu hooks (#62410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410 This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29990129 Pulled By: Varal7 fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4	2021-08-03 17:53:45 -07:00
James Reed	5542d590d4	[EZ] Fix type of functional.pad default value (#62095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62095 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29879898 Pulled By: jamesr66a fbshipit-source-id: 903d32eca0040f176c60ace17cadd36cd710345b	2021-08-03 17:47:20 -07:00
Heitor Schueroff	d7d399f3df	Exposes _aminmax as aminmax and makes it structured (#62401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62401 This PR exposes the `torch._aminmax` operator as `torch.aminmax`. TODO - [x] add examples to documentation - [x] add minmax to rst docs fixes https://github.com/pytorch/pytorch/issues/62164 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30072246 Pulled By: heitorschueroff fbshipit-source-id: 557d30af7c28ca6c238c59122367104036429ecd	2021-08-03 16:10:43 -07:00
Rong Rong (AI Infra)	92f470da08	Revert D30070707: [pytorch][PR] [5/n] Update test distribute path Test Plan: revert-hammer Differential Revision: D30070707 (`d8849bdb03`) Original commit changeset: c45f07b7b548 fbshipit-source-id: 867019e95b2898346ba2d918fa7a7291c8125efd	2021-08-03 16:00:56 -07:00
Ivan Kobzarev	18eeccc7e8	[mypy] Fix Optional type check (#62668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62668 Test Plan: Imported from OSS Reviewed By: malfet, 842974287 Differential Revision: D30077960 Pulled By: IvanKobzarev fbshipit-source-id: 5e423bfb65a65974ed848caa177330d6e61452e6	2021-08-03 16:00:55 -07:00
Ivan Kobzarev	5a49abfaf1	Revert "Revert D29940705: [fx2trt] Dynamic shape inference support" (#62667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62667 This reverts commit 053e11f1b39b50fcd7aa7cdd272f7775c7a5e1ba. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30077961 Pulled By: IvanKobzarev fbshipit-source-id: a7e76b2d2fa79e6c42a6a87f0a13f62642591fee	2021-08-03 15:59:40 -07:00
Mike Iovine	34f50c6e35	[Static Runtime] testStaticRuntime verifies that # of nodes is at least 2 (#62622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62622 This allows us to catch cases where an out variant is being tested but the test author forgot to call `.clone()` in the test script. More than 2 ops does not guarantee that the memory planner is being exercised, but less than 2 guarantees that it is not being used. Reviewed By: hlu1 Differential Revision: D30058050 fbshipit-source-id: 5bc053736f1cc6fd1ffcf8254bf38874ac18c34b	2021-08-03 15:55:57 -07:00
Eli Uriegas	2bddaf6149	Revert D30072859: [pytorch][PR] [4/n] Update vulkan test path Test Plan: revert-hammer Differential Revision: D30072859 (`1630b86dd6`) Original commit changeset: bf75faabf6b6 fbshipit-source-id: 3e2672bd19544ed3f1e2a951eb02d58f5c2f9d52	2021-08-03 15:28:04 -07:00
tktrungna	ad4e1f1132	[6/n Update test rpc path (#62526 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_rpc` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add IN_WHEEL_TEST to limit the change for linux CI GHA only Pull Request resolved: https://github.com/pytorch/pytorch/pull/62526 Test Plan: check if all ci workflows pass Reviewed By: walterddr, seemethere Differential Revision: D30072994 Pulled By: tktrungna fbshipit-source-id: 3217e764bd859dc2db597d24a1abb5ec1d0e8c9e	2021-08-03 15:26:54 -07:00
Eli Uriegas	c48dfe0d9f	.github: Enable SSH to linux runners (#62280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62280 Enables SSH to linux GHA runners for FB employees while on the FB VPN SSH keys will be added to runners when the label "with-ssh" is applied to your pull request. Depnds on https://github.com/fairinternal/pytorch-gha-infra/pull/8 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99, soulitzer Differential Revision: D29941681 Pulled By: seemethere fbshipit-source-id: 9d291f4291eb1d814d4a3473f7daf7f6951ad724	2021-08-03 15:15:39 -07:00
Victor Quach	9beb279d84	Add context manager to save tensors on CPU (#61928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928 Fix #57100. Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass. *If the tensor was already on cpu, the entire operation is a no op. If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously. See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448) Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29848526 Pulled By: Varal7 fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5	2021-08-03 13:08:37 -07:00
Angela Yi	91ef19309e	[quant] Input-weight equalization - branch support (#62366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62366 In the case of models with branches, we are unable to equalize the branching part in the graph. For example, given this graph: ``` conv2 / \ x -> conv1 -> add ``` After prepare, we will ignore the branched layers (conv1 and conv2) and will not insert the equalization observers. A warning message will also be printed with the layers that are unable to be equalized. ``` conv2 -> out_quant_obs2 / \ x -> input_quant_obs -> conv1 -> out_quant_obs1 -> add ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Imported from OSS Reviewed By: malfet, supriyar Differential Revision: D29982585 fbshipit-source-id: 706297e7f1861975998dfa83e7ca59af09d80618	2021-08-03 12:45:25 -07:00
Andrew Gu	62a90c227f	Make _Join, _Joinable, _JoinHook public (#62605 ) Summary: Overview: This removes the preceding `_` from `_Join`, `_Joinable`, and `_JoinHook` in preparation for adding the generic join context manager tutorial (see [here](https://github.com/pytorch/tutorials/pull/1610)). This also adds a docs page, which can be linked from the tutorial. [Here](https://github.com/pytorch/pytorch/files/6919475/render.pdf) is a render of the docs page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62605 Test Plan: `DistributedDataParallel.join()`: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception ``` `ZeroRedundancyOptimizer`: ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` NOTE: DDP overlap tests are failing due to a landing race. See https://github.com/pytorch/pytorch/pull/62592. Once the fix is landed, I will rebase, and tests should be passing. `Join`: ``` gpurun4 python test/distributed/algorithms/test_join.py ``` Reviewed By: mrshenli Differential Revision: D30055544 Pulled By: andwgu fbshipit-source-id: a5ce1f1d9f1904de3bdd4edd0b31b0a612d87026	2021-08-03 12:20:11 -07:00
Nikita Shulga	053e11f1b3	Revert D29940705: [fx2trt] Dynamic shape inference support Test Plan: revert-hammer Differential Revision: D29940705 (`6b02ad5f82`) Original commit changeset: 1eab53a8cfd5 fbshipit-source-id: 68150a193df6f11389b14a0e8224e1489b29ff0b	2021-08-03 12:03:42 -07:00
Nathan Lanza	ff31389c21	Cast a few vars to void that are otherwise unused Summary: llvm-13 marks this as an error when a variable is set but not used. Evidently this macro doesn't always expand to using the var. Work around that here with void casts. Test Plan: nfc Reviewed By: drodriguez Differential Revision: D30062462 fbshipit-source-id: ff868ec74116da99afd539142996d2ffffd399fb	2021-08-03 11:57:57 -07:00
Raghavan Raman	59dd12042e	[nnc] Removed const from all fields in IR. (#62336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336 This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change. This is the first step in making all NNC mutations in-place. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30049829 Pulled By: navahgar fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63	2021-08-03 11:44:36 -07:00
Jacob Szwejbka	474d7ec43b	[Pytorch Edge] Black Box Compatibility API (#61477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477 It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide. The general usage would be something like < On the Client > RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info(); . . . < On the Server > ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>); bool compatible = is_compatible(runtime_info, model_info); Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime). Test Plan: unit test and ci Reviewed By: dhruvbird, raziel Differential Revision: D29624080 fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700	2021-08-03 11:27:28 -07:00
Jeff Daily	b7391f44df	cast return of cudaGetLastError() to void when discarding (#62518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62511. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518 Reviewed By: walterddr, janeyx99 Differential Revision: D30029858 Pulled By: malfet fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d	2021-08-03 11:17:22 -07:00
Nikita Shulga	d6048ecd6b	Enable bazel builds on `ciflow/default` (#62649 ) Summary: Add `regenerate.sh` convenience script Remove "TODO: Reenable on PR" label from workflows which are enabled on PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/62649 Reviewed By: seemethere Differential Revision: D30071905 Pulled By: malfet fbshipit-source-id: c82134cb676b273d23e225be21166588996a31d3	2021-08-03 11:05:41 -07:00
Rohan Varma	4d5607bb25	[Reland][DDP] log bucket sizes (#62625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62625 reland of https://github.com/pytorch/pytorch/pull/62232 which ran into a land race. Test Plan: ci Reviewed By: SciPioneer Differential Revision: D30058217 fbshipit-source-id: 1454dd481e630f3de9ec6111b3f2e18cd8976091	2021-08-03 10:55:46 -07:00
tktrungna	1630b86dd6	[4/n] Update vulkan test path (#62519 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_vulkan` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add `IN_WHEEL_TEST` to limit the change for `pytorch_linux_test` only * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). Pull Request resolved: https://github.com/pytorch/pytorch/pull/62519 Test Plan: check if all ci workflows pass Reviewed By: walterddr Differential Revision: D30072859 Pulled By: tktrungna fbshipit-source-id: bf75faabf6b6070c366571a74834a1f58b2549d3	2021-08-03 10:24:47 -07:00
Jerry Zhang	ddd916c210	[quant][refactor] Return the models in checkGraphModeFxOp for further checking (#62487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62487 checkGraphModeFxOp is our utility test function to quantize a given model with FX Graph Mode Quantization and checks whether the result model contains expected ops, previously it only returns a result on the sample data for the quantized model, this PR chagnes it to return prepared, quantized, quantized_reference models together with the result for quantized models. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053981 fbshipit-source-id: 31fbce48d138261d0b00ba24e1427fd0c6208990	2021-08-03 10:12:16 -07:00
Xiang Gao	76c447a730	Remove CUDA10.2 + gcc 9 in CI (#62609 ) Summary: This is an invalid combination because CUDA10.2 does not support gcc>8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62609 Reviewed By: iramazanli Differential Revision: D30057292 Pulled By: seemethere fbshipit-source-id: 7cb0fa8401e80297846b0fcb5e0ecaa435b101be	2021-08-03 10:05:16 -07:00
tktrungna	d8849bdb03	[5/n] Update test distribute path (#62520 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_distributed` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add IN_WHEEL_TEST to limit the change for linux CI GHA only Pull Request resolved: https://github.com/pytorch/pytorch/pull/62520 Test Plan: check if all ci workflows pass Reviewed By: soulitzer Differential Revision: D30070707 Pulled By: tktrungna fbshipit-source-id: c45f07b7b54857dc8e78405714d6d5b864c30868	2021-08-03 09:52:48 -07:00
Shiyan Deng	6b02ad5f82	[fx2trt] Dynamic shape inference support Summary: Add a field called `shape_range` to `inputTensorSpec` which allow user to indicate the range of the input shape. Make all current converters work with dynamic shape expect `layer_norm`. Need to make the layer_norm plugin to be `IPluginV2Ext`. Some ops only have limited dynamic shape support for now: - "linear": only support at most 1 dynamic dim. We add full support but I'm thinking breaking down linear to matmul + add. - "adaptive_avgpool`: right now we lower it to trt avgpool which means we need to know the last two dims to calculate parameters like kernel_size, strides, etc. Follow up would be make a plugin for adaptive avgpool. TRTorch already have one, we can borrow it. Test Plan: Added unit tests for dynamic shape inference for converter tests. Reviewed By: jackm321 Differential Revision: D29940705 fbshipit-source-id: 1eab53a8cfd5e8db0be57845062e9794578165d1	2021-08-03 09:44:26 -07:00
Peter Bell	b7ac286d0e	CMake: Add optional precompiled header support (#61940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940 This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which precompiles `ATen.h` and also `CUDAContext.h` for the cuda library. After making a change in `native_functions.yaml`, this speeds up compilation time by around 15% on my machine. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29988775 Pulled By: malfet fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b	2021-08-03 09:13:47 -07:00
Philip Meier	2cf4d8128d	add `OpInfo` for `torch.nn.functional.mse_loss` (#62254 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62254 Reviewed By: malfet Differential Revision: D30013331 Pulled By: zou3519 fbshipit-source-id: e3242cb97d1f061b932e3e0ed589f1ee6a291512	2021-08-03 09:01:09 -07:00
Raghavan Raman	ab8af15545	[Static Runtime] Enabled building Static Runtime tests and benchmarks in OSS CI (#62226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62226 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29923800 Pulled By: navahgar fbshipit-source-id: 33cfe0e92a34c7140ea762e5715301cfbf401434	2021-08-03 08:52:36 -07:00
Andrew Gu	43327cc197	Refactor commonalities between two approaches (#62624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62624 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058543 Pulled By: andwgu fbshipit-source-id: 73c794062b75e011868fae264f592549eed67482	2021-08-03 08:43:14 -07:00
Andrew Gu	e6a3967c2a	Add invariant check (bucket indices: 0, 1, ..., k-1) (#62623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62623 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058544 Pulled By: andwgu fbshipit-source-id: a56910f294c6a40118751eebe255b62700f42be9	2021-08-03 08:13:52 -07:00
Kevin Tse	87465a6e68	adding operator cumulative_trapezoid (#61615 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/61616 * https://github.com/pytorch/pytorch/issues/61615 * https://github.com/pytorch/pytorch/issues/61475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615 Reviewed By: malfet, mruberry Differential Revision: D29975064 Pulled By: NivekT fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7	2021-08-03 08:04:00 -07:00
Sergei Vorobev	b37578b3c0	Make bazel output less verbose in CI (#62601 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62600 Adds `bazel --config=no-tty` that is useful for less verbose output in environments that don't implement full tty like CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62601 Reviewed By: soulitzer Differential Revision: D30070154 Pulled By: malfet fbshipit-source-id: 5b89af8441c3c6c7ca7e9a0ebdfddee00c9ab576	2021-08-03 07:59:01 -07:00
Victor Quach	3bda4ea842	Avoid unnecessary copying data in Saved Variable (#61927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61927 This is a refactor of `SavedVariable.cpp` to prevent ever defining the `data_` tensor if default hooks are set. Before the refactor: ```c++ data_ = variable.tensor_data(); // this is wasteful if hooks are defined register_hooks(Engine::get_default_engine().get_default_saved_variable_hooks()); ``` After the refactor: ```c++ if (get_default_hooks_()) { save_metadata_(variable); register_hooks_(get_default_hooks_(), variable); return; } save_metadata_(variable); data_ = variable.tensor_data(); // only needed if hooks are not defined ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29848524 Pulled By: Varal7 fbshipit-source-id: abca1eee37a17b47841e28d8a576490913fce1ce	2021-08-03 07:09:47 -07:00
Yukio Siraichi	7edb4f8761	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939489 Pulled By: ezyang fbshipit-source-id: d5e4a6dfa6c79e4b135508ea13c2d11bd0684f63	2021-08-03 06:58:13 -07:00
Yukio Siraichi	e52325655a	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939152 Pulled By: ezyang fbshipit-source-id: b3379033a1ffe3c7bc8216d16d089d388ea559ba	2021-08-03 06:57:09 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
kshitij12345	fd8004b42e	add bfloat16 impl for nextafter (#61829 ) Summary: Add `BFloat16` support for `nextafter`. * [x] Add OpInfo * [x] Add Implementation Test (C++ tests) * [x] Add credit Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829 Reviewed By: ejguan Differential Revision: D29932498 Pulled By: mruberry fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4	2021-08-02 23:16:58 -07:00
Richard Barnes	2888b7fec5	Fix sign comparison (#62483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62483 Test Plan: Sandcastle Reviewed By: albanD Differential Revision: D30015385 fbshipit-source-id: eefc3208fb8c42ff46b9f4d910eb93c32595fa28	2021-08-02 22:50:39 -07:00
Nikita Shulga	a77be16538	TensorAccessor::bounds_check should be a CPU-only funciton (#62628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62628 This fixes following errors when ROCm compiler is used ``` caffe2/aten/src/ATen/core/TensorAccessor.h:160:5: error: throw is prohibited in AMP-restricted functions TORCH_CHECK_INDEX( ^ ``` Test Plan: CI Reviewed By: zhouzhuojie, seemethere Differential Revision: D30059737 fbshipit-source-id: d094ee608768db41fcc91d044c2c6d7d29f33fe4	2021-08-02 22:46:24 -07:00
Adam Simpkins	e0364ccc33	[caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632 Update the caffe2/core/context.h to directly use `at::mt19937` instead of the `at::CPUGeneratorImpl` wrapper class from the ATen-cpu library. Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and caffe2 code. In particular the `at::CPUGeneratorImpl::get_state()` logic depends on CPU Tensor functionality that currently depends on code from caffe2. Test Plan: The RNG behavior should be identically to the previous code (perhaps even faster since we now avoid virtual function calls). buck test //caffe2/caffe2:caffe2_test_cpu \ //caffe2/caffe2/python: //caffe2/caffe2/fb/operators: Differential Revision: D29915701 fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a	2021-08-02 22:40:56 -07:00
Pritam Damania	88af4d8441	Initialize RRefs only when explicitly asked for. (#62618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62618 ShardedTensor implicitly initialized RRefs to remote shards if the RPC framework was initialized. Although, there are use cases where the RPC framework might be initialized for a different purpose but users would not prefer that ShardedTensor initializes RRefs as well. As a result, I've made RRef initialization explcitit in ShardedTensor APIs. ghstack-source-id: 134889287 Test Plan: 1) waitforbuildbot 2) unit tests. Reviewed By: wanchaol Differential Revision: D30056833 fbshipit-source-id: 9b2433a38dafa1888589c5b72ed93b6f0ee51639	2021-08-02 22:17:17 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Nathan Lanza	2d038b5dc8	Cast a var to void that is unused Summary: The comment above makes it seem intentional, so just ignore it. Test Plan: NFC Reviewed By: smeenai Differential Revision: D30057632 fbshipit-source-id: 45929b4eeeefdf22f5c7c4dd603229635f9da31b	2021-08-02 19:56:41 -07:00
Santiago Castro	c4196bee93	Save some memory in scatter (#62516 ) Summary: Also removes some redundant parenthesis for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62516 Reviewed By: andwgu Differential Revision: D30030546 Pulled By: SciPioneer fbshipit-source-id: e106486f70b9590bf3dcffb76d23f5725737542f	2021-08-02 18:41:58 -07:00
Hui Guo	10d3a2c13a	[tensorexpr] Added logging info for SimplifierUnderContext (#62138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62138 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29891257 Pulled By: huiguoo fbshipit-source-id: c36b3d615fa2fe971d022111bef61ee843a9dbea	2021-08-02 18:38:55 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Pritam Damania	c07a123b26	Support saving and loading ShardedTensor. (#62242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62242 1) Add a state_dict hook to ensure ShardedTensors are added to a state_dict. 2) Add a pre load state_dict hook to ensure ShardedTensor are added back to a module at load time. 3) Add a `with_load_process_group` context manager for load time. 4) Added ser-de capability to ShardedTensor. ghstack-source-id: 134860967 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: wanchaol Differential Revision: D29927881 fbshipit-source-id: b1ef8872ed91e9cb0e2d5dd17d2764678ab89f0c	2021-08-02 18:33:19 -07:00
Eli Uriegas	dd23372aa5	.circleci: Prefix intermediate build image tags (#62610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62610 Prefixes intermediate build image tags with build- so that ECR lifecycle policies can automatically clean them up Policy to automatically cleanup images prefixed with `build-`: `b02dd818f9` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30055952 Pulled By: seemethere fbshipit-source-id: 328b9c94ffc02877d088d0118a19c732f580838b	2021-08-02 18:17:14 -07:00
Victor Quach	525fa2f0b6	[reland] Catch saved tensors default hooks race condition (#62564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62564 If the user runs code that registers default saved tensor hooks from multiple threads, it will fail with a nice error message most of the time. This commit handles the very rare case where a race condition would have made it fail silently. Relanding previous PR #61957 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30045406 Pulled By: Varal7 fbshipit-source-id: d04f74c99affbbf655e53cfc2acd42f7c5b4e6eb	2021-08-02 18:00:37 -07:00
Nikita Shulga	f5cf24a224	Fix lint in test_deploy_from_python.py (#62626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62626 Reviewed By: walterddr, zhouzhuojie, seemethere Differential Revision: D30059119 Pulled By: malfet fbshipit-source-id: 2aff44c1585091d864ab7e02d69046204e5b5d17	2021-08-02 17:55:24 -07:00
Mustafa Bal	615ac8e573	Added logic for notifying PTE webapp for Nightly and PR builds (#62512 ) Summary: This PR adds the logic to notify the PTE webapp for DevOps PyTorch Nightly and PR builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62512 Reviewed By: iramazanli Differential Revision: D30046165 Pulled By: malfet fbshipit-source-id: ef7e4848d4db9f38536a647fcd2d8e26cf64b960	2021-08-02 16:44:35 -07:00
Yi Wang	db071ef005	[Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592 Reland #62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. ghstack-source-id: 134848352 Test Plan: unit test Reviewed By: andwgu Differential Revision: D30049431 fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9	2021-08-02 16:38:09 -07:00
Salil Desai	d228a8fc94	[Vulkan] Softmax Along Channel Dim (#62239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62239 Added naive implementation of vulkan softmax (not using shared memory) Based off of naive implementation of mean, found here: `2565a33c98/aten/src/ATen/native/vulkan/glsl/mean.glsl` Test Plan: After building: ``` build/bin/vulkan_api_test ``` {F637001190} ``` [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (180 ms) ``` Reviewed By: SS-JIA Differential Revision: D29793150 fbshipit-source-id: 4f9d8e1dae8a43cbcb7063b095fa4726df06c929	2021-08-02 16:20:44 -07:00
Peter Bell	940cbbce76	Add BFloat16 support to CPU nansum (#61083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083 It's already supported on CUDA, so it seems reasonable to support on CPU as well. This also changes `test_nansum` to compare against `torch.sum` since numpy doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against NumPy as well, so that's still being tested. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30006227 Pulled By: heitorschueroff fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c	2021-08-02 16:03:57 -07:00
Zachary DeVito	27d3d3a7d7	deploy in python fix to work in @opt mode Summary: if we let torch_deploy get put in libomnibus, it hides the symbols we need to link against Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy_from_python -- --exact 'caffe2/torch/csrc/deploy:test_deploy_from_python - test_deploy_from_python (caffe2.torch.csrc.deploy.test_deploy_from_python.TestDeployFromPython)' --run-disabled Reviewed By: wconstab Differential Revision: D30031134 fbshipit-source-id: e5c2f740f17abafec7d01c57c664bd71a00b6f61	2021-08-02 14:47:49 -07:00
Gao, Xiang	a4af91b2fe	Cleanup CUDA 10.1 and 10.0 support on CI (#62597 ) Summary: 10.1 is removed in https://github.com/pytorch/pytorch/pull/56056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62597 Reviewed By: walterddr Differential Revision: D30053902 Pulled By: seemethere fbshipit-source-id: deb148e5e44c12b08c267a36fbd4a1afa138e6e4	2021-08-02 14:42:25 -07:00
Jacob Szwejbka	305d5fcc05	[Pytorch Edge] get_model_bytecode int -> uint (#62201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62201 change int to uint to be the same type as the runtimes bytecode. Only affects c++ since python doesn't have uints iirc. Also changed the behavior of the functions from returning -1 and a warning to just throw an exception. Wasnt sure what the proper behavior here would be (returning UINT_MAX seemed gross) so feedback is appreciated. Test Plan: ci Reviewed By: raziel Differential Revision: D29914072 fbshipit-source-id: 1bb08702fc301d7c7612b5ad7205a6dbe855c890	2021-08-02 14:17:44 -07:00
Nikita Shulga	0c4c37b01e	Disable libtorch testing on MacOS (#62599 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/issues/62402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62599 Reviewed By: walterddr, driazati Differential Revision: D30051914 Pulled By: malfet fbshipit-source-id: a07184b21cc4b2d0ae31fe385bb58a0f665595c6	2021-08-02 13:41:18 -07:00
Bradley Davis	093495d3f0	[fx] prevent implicit submodule inlining when submodule is a GraphModule (#62436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62436 ## Problem Given two modules and a tracer that indiscriminately marks all modules as a leaf: ``` class InnerModule(torch.nn.Module): def forward(self, t): return t + t class MyModule(torch.nn.Module): def __init__(self, inner): super().__init__() self.inner = inner def forward(self, t): x = self.inner(t) y = self.inner(t) return x + y class MyTracer(torch.fx.Tracer): def is_leaf_module(self, module, name): return True ``` One might generally expect the following behavior (note call_module nodes): ``` print(">> Outer GraphModule (with inner module as nn.Module):") inner = InnerModule() m = MyModule(inner) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Outer GraphModule (with inner module as nn.Module): opcode name target args kwargs ------------- ------- ----------------------- ---------------- -------- placeholder t t () {} call_module inner inner (t,) {} call_module inner_1 inner (t,) {} call_function add <built-in function add> (inner, inner_1) {} output output output (add,) {} None ``` However, when the inner module is first symbolically traced, the symbolic trace of the outer module ignores `is_leaf_node` entirely, and traces through the whole module (note call_function nodes). ``` print(">> Inner module as GraphModule:") inner = InnerModule() inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) print(inner_gm.graph.print_tabular()) print(">> Outer GraphModule (with inner module as GraphModule):") m = MyModule(inner_gm) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Inner module as GraphModule: opcode name target args kwargs ------------- ------ ----------------------- ------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} output output output (add,) {} None >> Outer GraphModule (with inner module as GraphModule): opcode name target args kwargs ------------- ------ ----------------------- ------------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} call_function add_1 <built-in function add> (t, t) {} call_function add_2 <built-in function add> (add, add_1) {} output output output (add_2,) {} None ``` This is surprising behavior and at first glance violates the tracer's intent. As I understand it, `torch.fx.symbolic_trace.Tracer.trace` intends to patch `torch.nn.Module.__call__` with a `module_call_wrapper()` that records a `call_module` node if the module is a leaf, else executes `torch.fx._symbbolic_trace._orig_module_call = torch.nn.Module.__call__`, which is set a module loading time. Every submodule should be a leaf, but no `call_module` nodes are created when that submodule is a `GraphModule`. Why? Upon further inspection, I found: - The constructor for GraphModule includes a path to `GraphModule.recompile()` via the setter for a `fx.Graph`: ``` inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) File "/torch/fx/graph_module.py", line 252, in __init__ self.graph = graph File "/torch/nn/modules/module.py", line 1183, in __setattr__ object.__setattr__(self, name, value) File "/torch/fx/graph_module.py", line 277, in graph self.recompile() ``` - `recompile()` wraps the `__call__` method by holding a reference to the `__call__` method at the time of recompilation: ``` cls = type(self) cls_call = cls.__call__ ... def wrapped_call(self, args, kwargs): try: return cls_call(self, args, *kwargs) except Exception as e: ... cls.__call__ = wrapped_call ``` - Recompilation of the inner GraphModule happens on initialization, before creation or tracing of the outer module. Adding some old-fashioned print debug statements gives: ``` Inner Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> recompile: cls.__call__ now wraps _orig_module_call, <function Module._call_impl at 0x7faaebfee8b0> Outer Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> tracing: patching method <class 'torch.nn.modules.module.Module'>.__call__ <function Module._call_impl at 0x7faaebfee8b0> with <function Module._call_impl at 0x7fa9d42bce50> outer module MRO before tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7faaebfee8b0> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> outer module MRO during tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7fa9d42bce50> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO before tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO during tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> ``` - The outer module is patched correctly, but the inner module's first element in its MRO is the `wrapped_call` from `recompile` that still invokes `<function Module._call_impl at 0x7faaebfee8b0>` directly. Therefore, no call_module nodes are created. ## In Practice In practice, this behavior affects the ability of `torch.package` to package `GraphModules` whose submodules are `GraphModules`. In our case, the `GraphModule` submodules are not passed through a constructor, but created separately and installed on the root `GraphModule` via `setattr`. This means that prior to packaging, there appear to be no issues with the module, since the root's graph was created before any call_module targets were replaced with `GraphModules`. When unpackaging such a model with `torch.package`, `torch.fx.graph_module._deserialize_graph_module` uses an inline `KeepModules` tracer that sets all submodules to leaves; the unpackaged module is implicitly and surprisingly inlined in the process. ## Potential Solution This behavior was previously not understood by us, and so the current workaround is a gnarly process of wrapping all submodules with a `nn.Module` with a manually installed forward method. Changing `wrapped_call` to return `return super(type(self), self).__call__(args, **kwargs)` whenever `__call__` is inherited at least appears to solve the issue. Does this seem like an acceptable approach? ## Other Thoughts - Repeated calls to `recompile` create nested `wrapped_calls`, all for the purpose of error handling. This seems probably unnecessary ¯\\_(ツ)\_/¯ - If a root module with a overriden `__call__` method is symbolically traced, it is ignored Test Plan: ``` buck test: ✓ ListingSuccess: caffe2/test:fx - main (12.570) ✓ Pass: caffe2/test:fx - test_tracing_graphmodules_as_leaf_submodules (test_fx.TestFX) (11.982) ``` Reviewed By: ansley Differential Revision: D29997935 fbshipit-source-id: 1988fbb025b14188da26a3e73e94fb789c3c1f74	2021-08-02 13:37:08 -07:00
Howard Huang	dc1bd6acee	Remove PROCESS GROUP rpc backend (#62411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62411 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29990408 Pulled By: H-Huang fbshipit-source-id: 183d3b316767b12993cebbe32b73c2850fd1cc42	2021-08-02 12:26:22 -07:00
Yi Wang	2ec4f69b48	[DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532 This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature. ghstack-source-id: 134787831 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl Reviewed By: rohan-varma Differential Revision: D30031222 fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676	2021-08-02 12:25:01 -07:00
Victor Quach	b161ac541d	[reland] Add default Saved Variable hooks (#62563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30045405 Pulled By: Varal7 fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332	2021-08-02 11:30:26 -07:00
Eli Uriegas	6f95850127	Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class Test Plan: revert-hammer Differential Revision: D30024161 (`29c8b1db57`) Original commit changeset: 07e6072a2f7b fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de	2021-08-02 10:26:54 -07:00
Philip Meier	2e4f566d30	add `OpInfo` for `torch.nn.functional.softplus` (#62317 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62317 Reviewed By: malfet Differential Revision: D30013322 Pulled By: zou3519 fbshipit-source-id: e80affd10b81534234694c9e4326cc68c7efc7fe	2021-08-02 09:46:13 -07:00
kshitij12345	cb626da145	[fix] mark non-differentiable ops (#62529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62506 Fixes https://github.com/pytorch/pytorch/issues/62504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62529 Reviewed By: albanD Differential Revision: D30032665 Pulled By: malfet fbshipit-source-id: 90254c50fb4a873e3eda59c8484626137e01cb31	2021-08-02 09:40:45 -07:00
Meghan Lele	562b555a2b	[CUDA] Fix typo in Normalization.cu (#62515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62515 Summary This commit fixes an obvious typo in `Normalization.cu` I found while working on #62452. Since that PR will not be landed anytime soon, I thought it would be prudent to land this fix. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: makslevental Differential Revision: D30027324 Pulled By: SplitInfinity fbshipit-source-id: 9d368a54c13f8e2bf6f6956dfb2bee974226f726	2021-08-02 09:38:46 -07:00
Qing Hu	29c8b1db57	[DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. Test Plan: Local run comprehensive test with following results: https://pxl.cl/1Ml8b For two timeout failure test cases, most likely environment related and fail in my devserver. Reviewed By: SciPioneer Differential Revision: D30024161 fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f	2021-08-02 09:33:32 -07:00
BoTorch website deployment script	34cb2b5d04	Update SobolEngine docstring w/ correct behavior (#62548 ) Summary: Sobol was modfied to not drop the first point. This update reflects this behavior in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62548 Reviewed By: qingfeng10 Differential Revision: D30035627 Pulled By: Balandat fbshipit-source-id: 64c659ea30c0c929778da3b58041875834e25e9a	2021-08-02 09:04:38 -07:00
Marjan Fariborz	2445d5c60a	Removed the hypothesis tests for adaptive_avg_pool (#60886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60886 Remove all the hypothesis tests from test_adaptive_avg_pool2d_nhwc, test_adaptive_avg_pool, and test_adaptive_avg_pool3d_ndhwc. Test Plan: I tested it with buck test //caffe2/test:quantization and all three tests passed. The tests that failed are test_conv2d_api (test_quantized_functional.py), test_conv3d_api (test_quantized_functional.py), Reviewed By: wanchaol, jerryzh168 Differential Revision: D29432184 fbshipit-source-id: 2a4c540d2c169aec69cf8d143d5a155394885745	2021-08-02 08:57:14 -07:00
Yi Zhang	3dc588d577	Fix: no enough space for cu102 debug nightly build (#62465 ) Summary: Fixes #{issue number} ![image](https://user-images.githubusercontent.com/16190118/127632173-783630b7-c644-4239-b1dd-fb12b6bacf83.png) verification: https://app.circleci.com/pipelines/github/pytorch/pytorch/358483/workflows/a34a0123-54fe-418f-9211-4af75ee56a70/jobs/15120463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62465 Reviewed By: iramazanli Differential Revision: D30045280 Pulled By: janeyx99 fbshipit-source-id: f40090eb02fd1d86033971611d492c7b107cc4bd	2021-08-02 08:44:16 -07:00
Andrew Gu	51f687fd4b	Add overlap with DDP to ZeRO (two approaches) (#62157 ) Summary: Overview: This adds two approaches to overlapping `DistributedDataParallel.backward()` with `ZeroRedundancyOptimizer.step()` by providing two hook constructors: `hook_with_zero_step()` and `hook_with_zero_step_interleaved()`. The former waits for all backward computation to finish before starting optimizer computation, while the latter launches a partial optimizer computation using the contents of a gradient bucket once that bucket's all-reduce completes. The two approaches each suffer from their own weaknesses, and which one to use depends on the specific hardware configuration. Both approaches can share changes to `ZeroRedundancyOptimizer`. A user should pass `overlap_with_ddp=True` to `ZeroRedundancyOptimizer`, construct a DDP communication hook using either `hook_with_zero_step()` or `hook_with_zero_step_interleaved()`, and register that communication hook. `ZeroRedundancyOptimizer.step()` should still be called in the training loop, though the optimizer computation and communication will be offloaded to originate from the communication hook. Currently, the first two iterations are vacuous, meaning they do not result in parameter updates and the inputs are ignored. This is required to finalize the DDP bucket strategy and to then initialize the `ZeroRedundancyOptimizer`'s local optimizer based on that bucketing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62157 Test Plan: The existing `ZeroRedundancyOptimizer` tests pass, and new unit tests for both hooks pass: - ~~`test_ddp_with_zero_step_parity_cpu`~~ (removed for now due to flakiness in CI -- under investigation, could possibly be similar Gloo issue as with `hook_with_zero_step_interleaved()`) - `test_ddp_with_zero_step_parity_gpu` - `test_ddp_with_zero_step_interleaved_parity_gpu` These were tested on the AI AWS cluster. An analogous `test_ddp_with_zero_step_interleaved_parity_cpu` is missing due to existing bugs with Gloo. See https://github.com/pytorch/pytorch/pull/62302. Both approaches have been verified using an internal accuracy benchmark. Reviewed By: mrshenli Differential Revision: D29971046 Pulled By: andwgu fbshipit-source-id: a7234c23c7ea253f144a698fd7e3c0fe039de5e8	2021-08-02 08:33:34 -07:00
Joel Schlosser	ee482edf0a	Callable activation function support for Transformer modules (C++) (#62342 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60747 Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342 Reviewed By: malfet Differential Revision: D30022592 Pulled By: jbschlosser fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4	2021-08-02 08:06:39 -07:00
Rong Rong (AI Infra)	c9d5325c52	[BE] shorten the name part 1 (#62402 ) Summary: This should address part of https://github.com/pytorch/pytorch/issues/62357. 1. rename all files 'generated-*' to make it clear, filename will not be in CI workflow name 2. remove all 'pytorch-' in names 3. make sure the build test shell scripts are adopted to new name Next change should reduce more device related naming Pull Request resolved: https://github.com/pytorch/pytorch/pull/62402 Reviewed By: malfet Differential Revision: D30021959 Pulled By: walterddr fbshipit-source-id: 64b21a2020e25a507101c09c010cb593d8ac4146	2021-08-02 07:56:55 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Joel Schlosser	bbf6131159	Add factory kwargs test to test_modules (#62340 ) Summary: Adds a new `ModuleInfo`-based test to `test_modules.py`. The test passes `device` and `dtype` to each module during instantiation, ensuring that the kwargs are applied to any newly-created parameters or buffers. Note that the `device` and `dtype` kwargs should only be present when a module creates parameters or buffers; the test uses some mock magic to identify this. Originally lifted from `test/test_module_init.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62340 Reviewed By: malfet Differential Revision: D30022543 Pulled By: jbschlosser fbshipit-source-id: 77e5d46d6b11c16dc39d19a1c650ee48c26c54c1	2021-08-02 06:53:00 -07:00
CodemodService FBSourceClangFormatLinterBot	46b18aa294	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30039182 fbshipit-source-id: 3b38fc89585853bb9a5483a0de9ebd6852154a8d	2021-08-02 04:17:10 -07:00
Supriya Rao	aa5e3ad705	[quant] Support PerChannel quantization in FusedMovingAvgObsFakeQuantize (#62346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62346 Update the operator code to resize the min/max tensors if per-channel quant is selected. We need to do this because by default the observer creates empty tensors for min/max and scale/zero_point values when per-channel quantization is enabled Test Plan: python test/test_quantization.py test_fused_mod_per_channel Imported from OSS Reviewed By: HDCharles Differential Revision: D30003835 fbshipit-source-id: b5ec80261cb50ee543f21191a887e979dcde4667	2021-08-01 21:45:11 -07:00
Pavithran Ramachandran	7adb78017a	[countbuild][xplat/caffe2] contbuild with sanitizers (#61724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61724 To improve the stability of xplat/caffe2 code, we are enabling sanitizers (asan, tsan, ubsan) on contbuild. ghstack-source-id: 134339882 Test Plan: ``` buck test --show-output --flagfile fbsource//fbcode/mode/dev-asan --config fbsource.sanitizer=address fbsource//xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 0/7 artifacts, 0.00 bytes, 100.0% cache miss Building: finished in 14.5 sec (100%) 4538/4538 jobs, 4 updated Total time: 14.5 sec Testing: finished in 1.1 sec (1 PASS/0 FAIL) RESULTS FOR //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test PASS 1.0s 1 Passed 0 Skipped 0 Failed //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test TESTS PASSED ``` ``` buck test --show-output --flagfile fbsource//fbcode/mode/dev-tsan --config fbsource.sanitizer=thread fbsource//xplat/pytorch_models/build/ads_mai_test_train/v4:model_test clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 3/19 artifacts, 88.30 Kbytes, 66.7% cache miss Building: finished in 24.0 sec (100%) 4609/4609 jobs, 9 updated Total time: 24.9 sec Testing: finished in 0.9 sec (1 PASS/0 FAIL) RESULTS FOR //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test PASS 808ms 1 Passed 0 Skipped 0 Failed //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test TESTS PASSED ```` Reviewed By: dhruvbird, albanD Differential Revision: D29348099 fbshipit-source-id: 3d3255bff0464129745d2ed729d443f3e7470313	2021-08-01 12:02:30 -07:00
Yi Wang	32b37ba246	[DDP Communication Hook] Update the typing info of comm hook output as well as some docstring (#62457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62457 Specify `Future[torch.Tensor]` as DDP communication hook return type, which should be explicitly a single tensor. The previous API takes a list that has a single tensor. Note that now the typing info no longer accepts the internal type of `torch._C.Future`, which does not support torchscript and hence cannot support `Future[torch.Tensor]`. ghstack-source-id: 134771419 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_invalid_comm_hook_return_type Reviewed By: rohan-varma Differential Revision: D30007390 fbshipit-source-id: 246667c9b575b4c6e617b0a5b373151f1bd81e7f	2021-07-30 20:51:34 -07:00
Yi Wang	72295da6c3	Reformat (#62456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62456 as title ghstack-source-id: 134771417 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30006493 fbshipit-source-id: 1d1dc9cfff69a9b4fa31470177c1f4fa206a94ef	2021-07-30 20:50:19 -07:00
Richard Barnes	c506769f19	irange-ify 8 (#62422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62422 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879655 fbshipit-source-id: 69fdf0196091932f866bfaba707ad7643790fdd8	2021-07-30 20:31:58 -07:00
Eli Uriegas	bd9f35313a	Revert D29922299: [DDP] log bucket sizes Test Plan: revert-hammer Differential Revision: D29922299 (`5429f68f00`) Original commit changeset: 538b331c96e7 fbshipit-source-id: 3595fe04e8dea38bc9d05e8c70f2dcd2ad496ced	2021-07-30 20:27:36 -07:00
Meghan Lele	9df7ac7a94	Port `nll_loss_backward` to structured (#62144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62144 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29945279 Pulled By: SplitInfinity fbshipit-source-id: 2fee60e8424fc590a81767c9b0a8226a0c2fd69c	2021-07-30 19:43:10 -07:00
Rohan Varma	5429f68f00	[DDP] log bucket sizes (#62232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62232 Logs the bucket sizes in DDP logging so that we know which workflow ran with what bucket size config. Will be used to verify how changing bucket sizes in DDP affects perf. Based on the test, we can see inconsistency where the "first" bucket size actually is (last before rebuild buckets, first after). ghstack-source-id: 134663867 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29922299 fbshipit-source-id: 538b331c96e77048164ad130b377433be100a761	2021-07-30 18:07:04 -07:00
Richard Barnes	63d3da7961	Fix sign comparison (#62194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62194 Reviewed By: albanD Differential Revision: D29885396 Pulled By: r-barnes fbshipit-source-id: 8092f3002474a48fc6b349b9e369c8d59e832fcc	2021-07-30 17:18:05 -07:00
Pritam Damania	2006dc6316	[3/N] Remove unittest.skip from torch/testing/_internal distributed files. (#61991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61991 Continuation of https://github.com/pytorch/pytorch/pull/61887 and removing unittest.skip as much as possible. ghstack-source-id: 134759368 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29831860 fbshipit-source-id: fe57a7d56d4423924a2dec10bb670137ace0c9a4	2021-07-30 16:40:43 -07:00
Zachary DeVito	7521addede	[deploy] loader cleanup (#62223 ) Summary: Some refactoring of the custom loader logic: * Make sure we unregister frames when they are deleted so that future exceptions do not attempt to read unallocated memory * rename linker -> loader to make its name more correct * move the build of the loader into lib deploy since it can be shared across interpreters * unify the logic for finding the library symbol across ops and fbcode Pull Request resolved: https://github.com/pytorch/pytorch/pull/62223 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D29922002 Pulled By: zdevito fbshipit-source-id: b7f8ee5812e29a5d098fcf1bd9f4cea7d30ecb4c	2021-07-30 16:34:13 -07:00
Stephen Macke	174433267c	[dte] fastpath implementation for broadcast utility function (4/x) (#62493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62493 This diff adds a broadcast fastpath for the caffe2 broadcast utility function, which just copies the contents of a smaller tensor into a larger one. We also update the tests to exercise the new functionality. Test Plan: unit tests + let CI run Differential Revision: D29938285 fbshipit-source-id: 543ecc548500380e307be91902696033454964a2	2021-07-30 16:15:10 -07:00
Mike Guo	08539ca047	Add non-context manager usage support for profiler (#61690 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60238, https://github.com/pytorch/kineto/issues/329 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61690 Reviewed By: malfet Differential Revision: D30016561 Pulled By: ngimel fbshipit-source-id: 93a578ffbb556f4b584213ac9cfafcc5cf0a9270	2021-07-30 15:54:36 -07:00
Xiao Wang	6441caeaa7	Use multi-dimensional cuFFT transforms to improve FFT performance (#61203 ) Summary: Benchmark and numerical accuracy tests on A100 and V100 are available at https://github.com/xwang233/code-snippet/tree/master/fft-61203. I've checked the FFT results for different shapes/dims and different `dim` arg for `rfftn` and `irfftn` before and after this PR, and they all numerically matched. With this PR, about 10%~15% performance gain is expected on commonly used shapes and dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61203 Reviewed By: heitorschueroff Differential Revision: D29996244 Pulled By: zou3519 fbshipit-source-id: 02c9862eaa1ad8f2ae9c7f7448aeb9e23bcda276	2021-07-30 14:54:04 -07:00
Jiakai Liu	73c46092f1	[pytorch] sort the output of the model_dump util (#62485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62485 Make it easier to browse the code section by sorting the files by name. Test Plan: Imported from OSS Reviewed By: dhruvbird, malfet Differential Revision: D30016245 Pulled By: ljk53 fbshipit-source-id: c9cb3c1ad9bcaa5337a6ad5c575ac0c240751f6c	2021-07-30 14:40:07 -07:00
Eli Uriegas	49060aa81a	Revert D29999785: Reland D29943356: .github: Migrate ecr_gc to github actions Test Plan: revert-hammer Differential Revision: D29999785 (`49dc827712`) Original commit changeset: bb9285076551 fbshipit-source-id: c26b39fb2d3c361015ce7f205d3f5f4232845289	2021-07-30 14:33:12 -07:00
Masaki Kozuki	43d4fe68cd	[Foreach] support implicit broadcasting in slow path (#62167 ) Summary: This PR has foreach functions support implicit broadcasting via slow path. rel: https://github.com/pytorch/pytorch/issues/58833 cc: ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/62167 Reviewed By: mruberry Differential Revision: D30005109 Pulled By: ngimel fbshipit-source-id: f48c0a13e304411763541ffcfcfc6154adb26bac	2021-07-30 13:29:56 -07:00
Mengwei Liu	70f57bcb1e	[PyTorch] Fix quantized Conv1d module parameters (#62356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62356 In `torch/nn/quantized/module/conv.py`, Conv1d is making a scaler `kernel_size` into a tuple with size 2 by repeating `kernel_size` value. This logic is breaking `Conv1d` because internally it's unsqueezing the input with shape N, C, L to N, C, 1, L in [`qconv.cpp`](`06dfaadfc6/aten/src/ATen/native/quantized/cpu/qconv.cpp (L841)`). Applying aforementioned kernel to this input shape will produce negative output shape in [`ConvUtils.h`](`203f7ff6e0/include/fbgemm/ConvUtils.h (L118-L119)`), if kernel_size > 1. Here I'm modifying the processing logic for `kernel_size` and a few other parameters, to follow the pattern of [`torch/nn/module/conv.py`](`aae2a3c95e/torch/nn/modules/conv.py (L284-L287)`). Test Plan: Rely on unit test Reviewed By: kimishpatel Differential Revision: D29957556 fbshipit-source-id: ae13f7ca892d60b82cfffdf972cce422ebfaae8e	2021-07-30 12:27:52 -07:00
Kimish Patel	eac288ea77	[Pytorch Backend Delegation] Annotate function args with type information (#62433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62433 Without type information, default type is Tensor which may conflict at runtime. Test Plan: CI Reviewed By: raziel Differential Revision: D29990902 fbshipit-source-id: 0a38843d7d0612a458bb38fad7c86bad08c7197b	2021-07-30 11:34:40 -07:00
Philip Meier	f16c73b9f3	Improve error messages of `torch.testing.assert_close` for sparse inputs (#61583 ) Summary: This utilizes the feature introduced in https://github.com/pytorch/pytorch/issues/60091 to modify the header of the error message. Before: ```python AssertionError: Tensor-likes are not equal! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index 1 Greatest relative difference: 0.3333333432674408 at index 1 The failure occurred for the values. ``` After: ```python AssertionError: Sparse COO values of tensor-likes are not equal! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index 1 Greatest relative difference: 0.3333333432674408 at index 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61583 Reviewed By: malfet Differential Revision: D30014797 Pulled By: cpuhrsch fbshipit-source-id: 66e30645e94de5c8c96510822082ff9aabef5329	2021-07-30 11:23:26 -07:00
Nathan Lanza	8a9dfa52e9	Delete an unused variable Summary: This was set twice but never used. Delete it. Test Plan: NFC Reviewed By: smeenai Differential Revision: D30000794 fbshipit-source-id: 084d16da914febec58c4cb5f452c37027275cd08	2021-07-30 11:10:38 -07:00
Ce Gao	73ba166e2a	fix(elastic-docs): Fix elastic launch doc (#62378 ) Summary: The documentation link should be https://pytorch.org/docs/stable/elastic/run.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/62378 Reviewed By: aivanou Differential Revision: D30002830 Pulled By: kiukchung fbshipit-source-id: 34b434acaa10222561df43f6397a2420eef02015	2021-07-30 10:58:13 -07:00
Richard Barnes	635e63c53d	irange-ify 15 (#62123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62123 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879765 fbshipit-source-id: eda8e641e9fd06e16ad71b8144332f253537955a	2021-07-30 10:41:33 -07:00
Ivan Yashchuk	3c0c1c4ecb	Fix incorrectly sized tensors for svd when full_matrices=False (#62022 ) Summary: Before this PR for m x n input matrix, the return matrices were always allocated as m x m and n x n and then narrowed. This unnecessarily requires a lot of memory that is then discarded. With this PR when `compute_uv=True and full_matrices=False` correctly sized tensors are allocated. Moreover, if `compute_uv=False` U, V matrices are not allocated as they are not needed. However, cusolver's gesvdj routines fail when these matrices are not allocated, which is a bug, so this allocation is done separately in cusolver specific code path. MAGMA doesn't work for this input because it tries to allocate a large matrix internally (ROCm doesn't work as it uses MAGMA). Example error: ``` CUBLAS error: memory mapping error (11) in magma_sgelqf at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgelqf.cpp:161 CUBLAS error: out of memory (3) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 CUBLAS error: not initialized (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 python: /opt/conda/conda-bld/magma-cuda110_1598416697386/work/interface_cuda/interface.cpp:806: void magma_queue_create_internal(magma_device_t, magma_queue*, const char, const char*, int): Assertion `queue->dAarray__ != __null' failed. Aborted (core dumped) ``` Fixes https://github.com/pytorch/pytorch/issues/61949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62022 Reviewed By: heitorschueroff Differential Revision: D29994429 Pulled By: ngimel fbshipit-source-id: c3f7744d7adc5fd6787f6cbb1ec41405f89a6d4c	2021-07-30 10:27:13 -07:00
Richard Zou	26d2f4acb2	Quick fix to make torch.tensor work with functorch (#62423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62423 Fixes https://github.com/facebookresearch/functorch/issues/7. functorch uses FuncTorchDynamicLayerBackMode as a mode key to wrap all tensors returned from operators in special TensorWrapper tensor extension. The problem with this is that TensorWrapper does not have storage so accessing the data_ptr (for recursive_store) internal asserts. As a quick hack, the guard added prevents functorch from wrapping the empty tensor in a TensorWrapper and instead when `tensor.to` is called later, the tensor gets wrapped. This is effectively what Ed proposed in https://github.com/facebookresearch/functorch/issues/7#issuecomment-847501020 In the long term we probably want some better way of extending `internal_new_from_data` for cases like this (where there is a mode-based dispatch key for a C++ tensor extension -- the Python case may be different). Test Plan: - Verified that this fixes functorch's problem Reviewed By: malfet Differential Revision: D29992607 Pulled By: zou3519 fbshipit-source-id: 82b713156a37d7470f8fc46e3803ee7353689a33	2021-07-30 10:15:23 -07:00
tktrungna	8c4d8c29e4	[2/n] add test ATen to wheel test (#62341 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * This PR introduces env variable IN_WHEEL_TEST to control the dependency on `build/` folder * update `test_aten` function to call wheel install folder `{sitepackages}/torch` instead of `build/` folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/62341 Test Plan: check if all ci workflows pass Reviewed By: walterddr Differential Revision: D30004259 Pulled By: tktrungna fbshipit-source-id: ccebd513a3530f1e5c8c9177d5f2daf14de3e853	2021-07-30 10:09:09 -07:00
Shiyan Deng	d08165dfdf	[fx2trt] Add op converters for ads 23x dense arch Summary: Adding 4 converters for 1. torch.addmm 2. torch.mul 3. torch.t 4. torch.sigmoid Test Plan: fx2trt unittests Able to lower dense arch with fx2trt locally. Reviewed By: ajtulloch, yinghai Differential Revision: D29563962 fbshipit-source-id: 114c4e871efb25379043224f5f0116829cd7dc50	2021-07-30 09:26:11 -07:00
Natalia Gimelshein	d783617216	enable warnings on cuda synchronization (#62092 ) Summary: This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though. As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion. For a test script ``` import torch torch.cuda.set_warn_on_synchronization(1) x=torch.randn(10, device="cuda") x.nonzero() y=torch.randn((), device="cuda") if y: print("something") torch.multinomial(x.abs(), 10, replacement=False) torch.randperm(20000, device="cuda") ind = torch.randint(10, (3,), device="cuda") mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool) val = torch.randn((), device="cuda") x[mask]=1. x[mask] = val torch.cuda.synchronize() ``` the output is ``` /../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x.nonzero() /../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) if y: something /../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) torch.multinomial(x.abs(), 10, replacement=False) /../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x[mask] = val ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092 Reviewed By: mruberry Differential Revision: D29968792 Pulled By: ngimel fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e	2021-07-30 09:13:01 -07:00
Isuru Fernando	273188549f	pass through EXITCODE EXITCODE__TRYRUN_OUTPUT variables (#49646 ) Summary: This is needed to allow cross compiling to work There are some `try_run` statements in CMake files used for building pytorch and dependencies. Since we are cross compiling, there's no way to run the compiled executables to get the output for `try_run` function. CMake provides a solution to this by requiring the user to manually provide the exitcode and the output of the executable which should be given by `EXITCODE` and `EXITCODE__TRYRUN_OUTPUT` respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49646 Reviewed By: heitorschueroff Differential Revision: D29960301 Pulled By: malfet fbshipit-source-id: b10ab9c182d1220f7e1911f922e7db261d521145	2021-07-30 08:22:33 -07:00
Howard Huang	b3781f0244	Remove faulty process group agent logic (#62409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62409 This a reland of #61907 because removing process_group_agent.h / cpp broke facebook specific tests. I will remove the files and update the internal test code in a separate PR. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29990001 Pulled By: H-Huang fbshipit-source-id: 2ee333322247d8b72691152308c3297e8c0c006d	2021-07-30 08:12:48 -07:00
Philip Meier	ee7d19ac29	add `OpInfo` for `torch.nn.functional.one_hot` (#62253 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62253 Reviewed By: heitorschueroff Differential Revision: D29992924 Pulled By: zou3519 fbshipit-source-id: 1fc81edf3c8ca0722c5db0b32929a4cb3285f634	2021-07-30 07:05:29 -07:00
Kushashwa Ravi Shrimali	09d10c4329	OpInfo for nn.functional.softmax (#62077 ) Summary: This PR: * Adds OpInfo for `softmax` and `nn.functional.softmax` (alias). * Skip removal for `test_jit_alias_remapping` test of `log_softmax`. Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry zou3519 pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/62077 Reviewed By: heitorschueroff Differential Revision: D29990019 Pulled By: zou3519 fbshipit-source-id: 67476990b54a5dd824eed9d10236e118564f2501	2021-07-30 06:56:03 -07:00
Gary Miguel	9fdf7ec6a2	[docs] Update sphinx to 3.5.4 (#61601 ) Summary: Sphinx 4.x is out, but it seems that requires many more changes to adopt. So instead use the latest version of 3.x, which includes several nice features. * Add some noindex directives to deal with warnings that would otherwise be triggered by this change due to conflicts between the docstrings declaring a function and the autodoc extension declaring the same function. * Update distributions.utils.lazy_property to make it look like a regular property when sphinx autodoc inspects classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61601 Reviewed By: ejguan Differential Revision: D29801876 Pulled By: albanD fbshipit-source-id: 544d2434a15ceb77bff236e934dbd8e4dbd9d160	2021-07-30 06:23:10 -07:00
Jane Xu	e352585f67	Clean up running smoke tests logic for Windows GHA (#62344 ) Summary: Followup to https://github.com/pytorch/pytorch/issues/62288 Front loads the logic and also force smoke tests to run on only one shard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62344 Test Plan: Note that for the windows cuda10 run on PR, we get only 1 shard with the smoke tests running: https://github.com/pytorch/pytorch/pull/62344/checks?check_run_id=3194294041 Reviewed By: seemethere, heitorschueroff Differential Revision: D29991573 Pulled By: janeyx99 fbshipit-source-id: 263d7de72c7a82a7205932914c32d39892294cad	2021-07-30 05:00:56 -07:00
Scott Cheng	329426c249	Fix cppdoc example syntax (#62385 ) Summary: a simple fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/62385 Reviewed By: suo Differential Revision: D30000982 Pulled By: heitorschueroff fbshipit-source-id: e2e1c9efba3734b58d9b5f358c01d12c2c8c91ff	2021-07-30 04:36:55 -07:00
Xiao Wang	d57ce8cf89	[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 (#62003 ) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from https://github.com/pytorch/pytorch/pull/53040#issuecomment-788264724 and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e	2021-07-30 00:35:21 -07:00
Stephen Macke	956c22b1f9	[dte] fastpath implementations for mulgrad / divgrad (3/x) (#62437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62437 In this diff we add a broadcast fastpath for MulGradient and DivGradient ops, whose tests we update to exercise the new functionality. Test Plan: Added test cases to elementwise ops (which will exercise the new MulGradient / DivGradient broadcast fastpath functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request allow_broadcast_fastpath=True, and nothing outside of the added tests currently does so. Differential Revision: D29938273 fbshipit-source-id: 281c1a109e38c25b9bf9ff8d832de60ac3c231a9	2021-07-30 00:05:34 -07:00
Nathan Lanza	607d720be1	Remove an unused variable Summary: This is set but never used Test Plan: NFC Reviewed By: smeenai Differential Revision: D30000830 fbshipit-source-id: 702d6f7b844b52bfe696725a6b0a055d494b739a	2021-07-29 23:10:03 -07:00
Supriya Rao	cfd0f5ebc9	[quant] update per-channel observer min/max_val attribute names (#62345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345 This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes. It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version. Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR. Test Plan: python test/test_quantization.py TestSerialization Imported from OSS Reviewed By: HDCharles Differential Revision: D30003700 fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a	2021-07-29 22:28:53 -07:00
Wanchao Liang	d92301dd02	[sharded_tensor] add new init_from_local_shards API (#60479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60479 This added `init_from_local_shards` API to construct a ShardedTensor from local_shards and global sharded_tensor_metadata. It also refactors the utils in ShardingSpec to be able to be used by sharded_tensor for sanity check purpose. Test Plan: test_init_from_local_shards test_init_from_local_shards_invalid_sharding Reviewed By: pritamdamania87 Differential Revision: D29276777 fbshipit-source-id: 011c1d70426bc560a59b8d858c68f1aa12db8481	2021-07-29 22:04:13 -07:00
Will Constable	bc787f2402	Fix setArgumentNames and make Script/Python consistent (#62442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442 For PythonMethodWrapper::setArgumentNames, make sure to use the correct method specified by method_name_ rather than using the parent model_ obj which itself _is_ callable, but that callable is not the right signature to extract. For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only list the argument names to the unbound arguments which is what we need in practice. Test Plan: update unit test and it passes Reviewed By: alanwaketan Differential Revision: D29965283 fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4	2021-07-29 21:29:06 -07:00
Dhruv Matani	725d98bab6	[Prototype] [PyTorch Edge] Speed up model loading by 12% by directly calling the C file API from FileAdapter (#61997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61997 After profiling the model loading latency on AI Bench (Android Galaxy S8 US), it seems like a significant amount of time was spent reading data using FileAdapter, which internally calls IStreamAdapter. However, IStreamAdapter uses `std::istream` under the hood, which is not that efficient. This change reduces the model loading time from [~293ms](https://www.internalfb.com/intern/aibench/details/600870874797229) to [~254ms](https://www.internalfb.com/intern/aibench/details/163731416457694), which is a reduction of ~12%. ghstack-source-id: 134634610 Test Plan: See the AI Bench links above. Reviewed By: raziel Differential Revision: D29812191 fbshipit-source-id: 57810fdc1ac515305f5504f88ac5e9e4319e9d28	2021-07-29 20:14:49 -07:00
Dhruv Matani	693d8f2f07	[PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996 A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model). This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided. See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou. ghstack-source-id: 134634612 Test Plan: Run using AI Bench. ### Speech Transducer v25 model (87MiB) Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle): ``` cd /tmp/ mkdir speech_model cd speech_model fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25 cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl ``` Here's how to build and run the benchmark using AI Bench: ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` Reviewed By: raziel Differential Revision: D29826210 fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f	2021-07-29 20:14:47 -07:00
Dhruv Matani	0b3f42fa4f	[PyTorch Edge] Add test for lite interpreter operator caching (#62306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306 Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode. In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details. ghstack-source-id: 134634613 Test Plan: Test command: ``` cd fbsource/fbcode/ buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs' ``` ``` cd fbsource/ buck test xplat/caffe2:test_lite_interpreter ``` Reviewed By: raziel Differential Revision: D29929116 fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325	2021-07-29 20:14:45 -07:00
Dhruv Matani	0bbdf0e1e3	[PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305 Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space. Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes! I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`. ghstack-source-id: 134634611 Test Plan: New test! Reviewed By: raziel, JacobSzwejbka, cccclai Differential Revision: D29954943 fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1	2021-07-29 20:13:06 -07:00
Nathan Lanza	90977e10ed	Remove an unused variable Summary: This is defined and then set once but never actually used. Kill it here. Test Plan: NFC Reviewed By: smeenai Differential Revision: D29994983 fbshipit-source-id: 0cb7383b3ec95f1aeed5210974bc95060cf10be5	2021-07-29 18:04:01 -07:00
Jerry Zhang	74291c8347	[quant][graphmode][fx] Fix the calls to load_arg in quantization_patterns.py (#62376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62376 load_arg(quantized=...) accepts a dictionary from index to dtype, not a list of dtype, the call is just to make sure the inputs are quantized with correct dtype Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29979711 fbshipit-source-id: 8499976ac5df8eb2019c3beae573dec6c9a56247	2021-07-29 17:28:07 -07:00
Stephen Macke	eef85f89b9	[dte] broadcast fastpath implementations for reduce utility functions (2/x) (#62428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62428 In this diff we add a broadcast fastpath for reduce utility functions. These functions are used by various elementwise ops, whose tests we update to exercise the new functionality. Test Plan: Added test cases to elementwise ops (which will exercise the new reducer functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request `allow_broadcast_fastpath=True`, and nothing outside of the added tests currently does so. Differential Revision: D29938264 fbshipit-source-id: 5d5542bd93afb85fd9f7a4073f766adc07eb3b65	2021-07-29 17:27:39 -07:00
Jerry Zhang	219917706e	[quant][graphmode] Add support for reference pattern for default ops (#62375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62375 default ops means ops that has one quantized input and one quantized output, e.g. gelu, silu, leaky_relu etc. and we need to insert observer for the output Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29979712 fbshipit-source-id: ed88210a9d6f1ab5cdb9397b4ff7f1628162ef22	2021-07-29 17:27:37 -07:00
Yi Wang	acba9b3104	[DDP Communication Hook] Simplify the implementation of parseHookResult of PythonCommHook (#62389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62389 Simplify the implementation of `parseHookResult` of `PythonCommHook`, by not directly accepting the output of allreduce, which is a tensor list. Address the comment on https://github.com/pytorch/pytorch/pull/62074#discussion_r675303280 Additionally, formatter is also applied to `OptimizerHookState` and `hook_then_optimizer`. ghstack-source-id: 134626246 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D29982485 fbshipit-source-id: 5b27cc5ef09d2f87c1ade4c0feef7eacc1af3a9a	2021-07-29 17:27:35 -07:00
Yi Wang	554daef820	Reformat test_c10d_nccl.py and distributed_test.py (#62388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62388 as title ghstack-source-id: 134626247 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29984086 fbshipit-source-id: 0960e5acc379ccdf08813387e11d3fb0a5f0e4b0	2021-07-29 17:27:33 -07:00
Yi Wang	9fee176be3	[Model Averaging] Fix docstring of PeriodicModelAverager (#62392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62392 The constructor of `PeriodicModelAverager` does not need to accept parameters. ghstack-source-id: 134626245 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29986446 fbshipit-source-id: 6a8b709e4383a3c44b9e60955fbb067cd2868e76	2021-07-29 17:26:27 -07:00
Jerry Zhang	8f519c5e07	[quant][graphmode] Add support for reference pattern for torch.cat (#62374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62374 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D29979713 fbshipit-source-id: 2d38991f96fcca783169ffd306bc2b0fb7debc69	2021-07-29 16:31:09 -07:00
Heitor Schueroff	502823c201	Change torch::Tensor to at::Tensor to fix build failure (#62425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62425 Fixes https://github.com/pytorch/pytorch/issues/62416 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D30000948 Pulled By: heitorschueroff fbshipit-source-id: 07dfc88a01b7718bc32be4342f43bb2cf2842b60	2021-07-29 16:31:08 -07:00
Eli Uriegas	49dc827712	Reland D29943356: .github: Migrate ecr_gc to github actions (#62438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62438 Switches out BASH_ENV for GITHUB_ENV This reverts commit 1f1d01df3ec06046880d0a92b930fbd051d60606. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29999785 Pulled By: seemethere fbshipit-source-id: bb92850765518005a3f530264643959e5038e681	2021-07-29 16:31:06 -07:00
Jerry Zhang	dc8b5db5f8	[quant][graphmode] relax the constraint for supported_dtypes for reference option (Linear and Conv) (#62348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62348 Originally we have a supported_dtypes check for linear and conv, but it's only valid for non reference option, this PR removes the constraint when is_reference=True and enables producing reference patterns for the dtype combinations that's not supported by fbgemm/qnnpack, for example qint8 activation dtypes Test Plan: python test/test_quantization.py TestQuantizeFx.test_linear_qint8_activation Imported from OSS Reviewed By: vkuzo Differential Revision: D29968675 fbshipit-source-id: 2abe37940eb62e16fcf0cbb700c174de49719223	2021-07-29 16:31:04 -07:00
Stephen Macke	9f9244aabe	[dte] scaffolding for c2 operator broadcasting fastpath (1/x) (#62369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369 This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs. Test Plan: sandcastle + let github CI run Differential Revision: D28154475 fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd	2021-07-29 16:31:02 -07:00
Yu Guo	5c47038d12	Back out D29792193 "Add default Saved Variable hooks" (#62415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62415 test error Differential Revision: D29990361 fbshipit-source-id: 99c87dec6c5be6496c9db5c9205c3cb72a953dd9	2021-07-29 16:31:00 -07:00
Yu Guo	dcfcefcd0b	Back out D29848525 "Catch saved tensors default hooks race condition" (#62414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62414 test error Differential Revision: D29990348 fbshipit-source-id: 1a7c668153ad7ad9e847dd1a74db678e787b6b0e	2021-07-29 16:29:46 -07:00
Richard Zou	389380ffcc	[reland] Refactor Tensor::to to call a primitive that is not copy_. (#62262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62262 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29934998 Pulled By: zou3519 fbshipit-source-id: 820069acd66fd5af97b98f42edfca68572c9eb1c	2021-07-29 10:49:32 -07:00
Raghavan Raman	7b6d569a2b	[jit] Renamed prim::Concat as prim::VarConcat (#61983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983 Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29828830 Pulled By: navahgar fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee	2021-07-29 10:28:59 -07:00
zhouzhuojie	5ede826178	Fix alpine ecr image pull (#62413 ) Summary: Fixes alpine ecr image pull in the render_test_result step ![image](https://user-images.githubusercontent.com/658840/127527503-e88f198d-a8d5-4d3b-a064-096dca07d713.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62413 Reviewed By: malfet Differential Revision: D29990844 Pulled By: zhouzhuojie fbshipit-source-id: ff420f57d5e4b80d0ebf73508001a127649e9eb2	2021-07-29 10:20:13 -07:00
Joel Schlosser	a42345adee	Support for target with class probs in CrossEntropyLoss (#61044 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11959 Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes. Implementation is dumb and simple right now, but future work can add higher performance kernels for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044 Reviewed By: zou3519 Differential Revision: D29876894 Pulled By: jbschlosser fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00	2021-07-29 10:04:41 -07:00
Nikita Shulga	dd0ef23a85	Delete .clang-tidy-oss (#62373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62373 Internal clang-tidy can handle all the options after D29863426 was deployed Test Plan: CI Reviewed By: 1ntEgr8 Differential Revision: D29978471 fbshipit-source-id: ea531734ab4fc3e0a26552bd24846b22c2e5c745	2021-07-29 09:30:18 -07:00
zhouzhuojie	7157ad44bc	Fix windows ci squid env (#62353 ) Summary: This is a re-land of https://github.com/pytorch/pytorch/pull/62244, noticeable changes are - Use jinja2 variables to DRY the settings - Added no_proxy for common destinations that don't fit into proxy (e.g. the magic settings from [aws link](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy)) - Try to trigger windows GHA CI flows - Also went through the actionlint for github action linting errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/62353 Reviewed By: driazati Differential Revision: D29970842 Pulled By: zhouzhuojie fbshipit-source-id: b9c457b0005bb1a64850949a56679d68fbb281d6	2021-07-29 09:20:30 -07:00
Thomas J. Fan	80a662e773	ENH Updates docs and tests for classification modules that already support no batch dims (#61874 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61874 Reviewed By: heitorschueroff Differential Revision: D29979977 Pulled By: jbschlosser fbshipit-source-id: 82c19151aa7220e564337b05d7677d52981e0aa2	2021-07-29 09:14:52 -07:00
Erjia Guan	b9f02778b2	Forward fix mypy for #61820 (#62398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62398 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29988610 Pulled By: ejguan fbshipit-source-id: 700dfa5b1c415bc058390bbe5727a739c8419b0f	2021-07-29 07:43:12 -07:00
Gao, Xiang	2d103025a5	Adding warning on isend about modifying after send (#61875 ) Summary: This is a standard limitation on communication collective libraries. For example: https://www.open-mpi.org/doc/v4.0/man3/MPI_Isend.3.php ``` A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender should not modify any part of the send buffer after a nonblocking send operation is called, until the send completes. ``` http://openucx.github.io/ucx/api/latest/html/group___u_c_p___c_o_m_m.html#ga8323878b60f426c630d4ff8996ede3cc ``` The user should not modify any part of the buffer after this operation is called, until the operation completes. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61875 Reviewed By: suo Differential Revision: D29783720 Pulled By: mrshenli fbshipit-source-id: 78fd047c74449f77b906f3766a6c2bc29499847d	2021-07-29 07:37:18 -07:00
albanD	945d40dca6	Also disable inplace fw AD for acos on windows (#62360 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62360 Reviewed By: malfet, bdhirsh Differential Revision: D29973310 Pulled By: albanD fbshipit-source-id: 3b033e779f557724602c5a87f497698f2262a12e	2021-07-29 06:42:25 -07:00
Jerry Cai	1b147a52f5	Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the conditionals (#61820 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61733 Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the condition. If the user specifies the new "param_shapes_constant" option when constructing a tracer, the model's parameter shape attribute will be evaluated and the resulting constant will be emitted into the IR during tracing. Also added a new test ` python test/fx/test_fx_param_shape_control_flow.py ` The test also performs a somewhat whitebox style testing to check the generated Python code from the IR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61820 Reviewed By: bdhirsh Differential Revision: D29969299 Pulled By: jerryzhenleicai fbshipit-source-id: 99aae824bdfec880be69258de7ead5c8cd59eddc	2021-07-28 23:48:44 -07:00
guyang3532	4ed8858817	Exclude time of waiting in queue from gloo communication prof… (#61342 ) Summary: Background: The gloo communication implementation is as follow: 1. Construct communication workers and push them into a queue. 2. Initialize a thread pool and each thread run a loop to get worker from the queue and execute it. Issue: The recorded profiling time span start from the worker construction and end at finish. So it will include the time of worker waiting in the queue and will result in multiple gloo communication time span overlapping with each other in a same thread in the timeline: ![image](https://user-images.githubusercontent.com/62738430/124867273-5bc95b80-dff0-11eb-8664-6e5d4166fc39.png) This is because when next work is waiting in the queue, the last work is not finished. Solution: This PR delays the profiling start time of gloo communication from worker construction to worker is really executed, so the profiling span will not include the time of waiting in queue. Implementation as follow: 1. Firstly, disable the original record function by specifying 'nullptr' to 'profilingTitle' argument of ProcessGroup::Work 2. Construct a 'recordFunctionBeforeCallback_' and 'recordFunctionEndCallback_' and save it as member of the worker. 3. When the worker is executed, invoke the 'recordFunctionBeforeCallback_'. 4. The 'recordFunctionEndCallback_' will be invoked at finish as before. After this modification, the gloo profiling span in timeline will not overlap with each other: ![image](https://user-images.githubusercontent.com/62738430/124868716-bb286b00-dff2-11eb-9cf0-d0494a356d0c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61342 Reviewed By: albanD Differential Revision: D29811656 Pulled By: gdankel fbshipit-source-id: ff07e8906d90f21a072049998400b4a48791e441	2021-07-28 22:24:26 -07:00
Joel Schlosser	35307b131d	Callable activation function support for Transformer modules (Python) (#61355 ) Summary: Fixes Python part of https://github.com/pytorch/pytorch/issues/60747 Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355 Reviewed By: bdhirsh Differential Revision: D29967302 Pulled By: jbschlosser fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705	2021-07-28 21:42:56 -07:00
Rohan Varma	1f2b96e7c4	[DDP] Make compute_bucket_assignment_by_size return per bucket sizes (#62231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62231 `compute_bucket_assignment_by_size` is responsible for setting per-bucket size limits, return this information from the function so that we are aware of size limits for each bucket. This is currently not being consumed, but will be in the next diff when we log bucket size limits to DDP logging. This will help us run experiments under different bucket size configs and analyze the impact. ghstack-source-id: 134480575 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29919056 fbshipit-source-id: dd5a096fa23d22e5d9dc1602899270a110db4a19	2021-07-28 20:21:01 -07:00
Rohan Varma	c76daa6de3	[DDP][ez] Remove misleading comment (#62230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62230 We don't iterate over model replicas anymore. ghstack-source-id: 134475834 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29918760 fbshipit-source-id: 84bde670b4e91667a49f94f1b597fad364498467	2021-07-28 20:20:59 -07:00
Rohan Varma	842228fd0d	[DDP] Save bucket size limits (#62229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62229 First of a stack of diffs to save and log the bucket size limits to help debug/discover discrepancies and analyze impact of bucket size tuning ghstack-source-id: 134475835 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29918629 fbshipit-source-id: b9b3f9a5658340a4c7fd72874c2254664e3c52e9	2021-07-28 20:19:56 -07:00
Pritam Damania	cac4aa71ca	Provide option to pass module instance to _load_state_dict_pre_hooks. (#62070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62070 We have a custom Tensor: https://github.com/pytorch/pytorch/blob/master/torch/distributed/_sharded_tensor/api.py#L67, which doesn't show up in state_dict for the module. This was resolved by using the _register_state_dict_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1196 to parse and add custom tensors to state_dict. However, the problem is during load time _register_load_state_dict_pre_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1272, does not pass in the module instance and as a result, a ShardedTensor in the state_dict cannot be appropriately added to a module at load time. To resolve this issue, in this PR I've enhanced this hook to support two variations, one which passes in the module instance (for the problem described above) and one is the previous version for BC reasons. ghstack-source-id: 134541391 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: jbschlosser Differential Revision: D29867142 fbshipit-source-id: bcb136ff51eedd0b508cfb419e8b8a6b7d95539c	2021-07-28 19:22:47 -07:00
Yi Wang	2eaf71d749	[Model Averaging] Update model averager API to avoid the redundant `params` arg needed by post-localSGD optimizer (#62132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62132 as title Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134560541 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29887751 fbshipit-source-id: 60dadb04790d800fdcc7cb8a08d060e411718739	2021-07-28 18:43:09 -07:00
Yi Wang	55bee44951	[Model Averaging] Post-localSGD optimizer (#62131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62131 Wrap `PeriodicModelAverager` as an optimizer. Currently both the optimizer and averager require an input `params` arg, where the latter actually can read params from the optimizer wrapper. Will update averager class API in a follow-up PR. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134560248 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D29881465 fbshipit-source-id: b9634972f4d8bffd3b3eb94f5dbbb19db2bcd759	2021-07-28 18:42:06 -07:00
Rohan Varma	58d45d950b	[DDP] Log unused param names under DETAIL debug mode. (#62209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62209 When `TORCH_DISTRIBUTED_DEBUG=DETAIL` is set, log names and indices of unused parameters when searching for them. Motivation is that we have seen a couple of issues occasionally when there are errors related to parameter possibly being marked as unused when it shouldn't, this can help narrow down the root cause by explicitly logging param names that are marked as unused. ghstack-source-id: 134541461 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29916085 fbshipit-source-id: d84cf637cbbd811521e6264ffd6c50ca8a79595b	2021-07-28 18:10:32 -07:00
David Riazati	24ed6e6b16	Add actionlint (#62364 ) Summary: This adds a linter for our GitHub actions. When a GitHub Actions workflow has an invalid definition, GitHub doesn't queue the job and doesn't report it as failed, so these can be hard to detect with the usual tools. This adds an explicit job to check if our workflow YAMLs are valid using [https://github.com/rhysd/actionlint](https://github.com/rhysd/actionlint). We deployed a similar check in pytorch/test-infra [here](https://github.com/pytorch/test-infra/pull/89). This PR enables the linter and fixes all the issues it complained about (it did already catch one bug where we were leaving `CIRCLE_BRANCH` blank when uploading binary size) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62364 Reviewed By: zhouzhuojie Differential Revision: D29973928 Pulled By: driazati fbshipit-source-id: 83b365e98fd6cbdcd75eeb44daf1be1c89056f8d	2021-07-28 17:10:20 -07:00
Peter Bell	fcc7fbe15f	Split zeta_kernel out of BinaryMiscOpsKernel.cu (#62261 ) Summary: `BinaryMiscOpsKernel.cu` takes 4 m 30 s to compile on my machine, which is the second slowest after `PowKernel.cu`. Moving the zeta kernel into it's own file takes 3 m 30 s, and reduces `BinaryMiscOpsKernel.cu` compile time to 1 m. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62261 Reviewed By: bdhirsh Differential Revision: D29969350 Pulled By: ngimel fbshipit-source-id: 37cad5775088b2f7d22948414e4bf0427f88e07d	2021-07-28 16:07:15 -07:00
Vasiliy Kuznetsov	f6e137598d	ns for fx: fix nit in default qlinear weight extraction function (#62334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62334 Removes the assert for node type in default qlinear weight extraction function. Without the assert, user defined functions can now use this util function without failing this check. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs // further tests will be in follow-up fb-only diffs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29963501 fbshipit-source-id: a634eabb5165375bde186438318ec52fa29c970f	2021-07-28 16:07:13 -07:00
Vasiliy Kuznetsov	72c943a2ac	ns for fx: fix bug for user function in weight extraction (#62333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62333 We incorrectly ignored any custom relationships the user specified in the `extract_weights` API. Fixing this and adding a test case. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29963502 fbshipit-source-id: 33ce3d4df1acb6298b6c7dcb6674015c8d14bdf4	2021-07-28 16:05:51 -07:00
Karen Zhou	d98b1c400d	[pruner] add cuda tests for pruner (#61993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61993 Repeating `test_pruner` unit tests for Linear and Conv2d models with device = 'cuda' to confirm pruner will work on GPU - set device to cuda - move model to device - assert that module.weight.device is cuda ghstack-source-id: 134554382 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1Md9c Reviewed By: jerryzh168 Differential Revision: D29829293 fbshipit-source-id: 1f7250e45695d0ad634d0bb7582a34fd1324e765	2021-07-28 14:45:04 -07:00
Richard Barnes	b39b28ced3	irange-ify 10 (#62122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62122 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879694 fbshipit-source-id: 87cd8ab17061129c835d9f961b67587c84d181d1	2021-07-28 13:35:23 -07:00
Richard Barnes	88f8f2ab94	irange-ify 6 (#62115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62115 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879576 fbshipit-source-id: 63cbf0ab5a52325fa2c3dec0e8239e2eac1ecf72	2021-07-28 13:32:11 -07:00
Richard Barnes	9e77113e85	irange-ify 11 (#62121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879701 fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca	2021-07-28 13:32:09 -07:00
Richard Barnes	b5867a1b34	irange-ify 7 (#62117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62117 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879640 fbshipit-source-id: 189578a57301747a3421742e145bbcdf2ad75c49	2021-07-28 13:30:39 -07:00
Jane Xu	59bb4f2dab	Revert D29928698: [pytorch][PR] Use private squid proxy Test Plan: revert-hammer Differential Revision: D29928698 (`6da4a25509`) Original commit changeset: 4ee78be0abe3 fbshipit-source-id: 44679a2b247ba8163f09895d9d36ecf5df4390b8	2021-07-28 12:35:55 -07:00
Meghan Lele	3a2603bc68	Port `slow_conv_transpose2d` to structured (#55503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55503 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29945028 Pulled By: SplitInfinity fbshipit-source-id: 0b696d104938287444210f1bc926afc13f899991	2021-07-28 12:03:03 -07:00
Meghan Lele	05b802d4e0	[pytorch] Bring back RemoveInplaceOps() (#62200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200 This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (`dec5aa2260`) that apparently had a bunch of internal users. Test Plan: danthe3rd Reviewed By: danthe3rd Differential Revision: D29833316 fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809	2021-07-28 12:00:38 -07:00
Raghavan Raman	b91a917616	[Static Runtime] Fixed another build failure in OSS due to test_utils.h (#62338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62338 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D29965744 Pulled By: navahgar fbshipit-source-id: cf3e54ac13432ea8afc4b718fac6c9768743d01b	2021-07-28 11:41:33 -07:00
Thomas J. Fan	7c588d5d00	ENH Adds no_batch_dim support for pad 2d and 3d (#62183 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62183 Reviewed By: ejguan Differential Revision: D29942250 Pulled By: jbschlosser fbshipit-source-id: d1df4ddcb90969332dc1a2a7937e66ecf46f0443	2021-07-28 11:10:44 -07:00
zhouzhuojie	6da4a25509	Use private squid proxy (#62244 ) Summary: This PR adds a private squid proxy (note that the internal ELB is only accessible from the private VPC subnets of GitHub Runners) that's deployed dedicated for PyTorch CI for GitHub runners. ``` dig $SQUID_PROXY 10.0.x.x 10.0.x.x ``` http_proxy and https_proxy are compatible with the following http clients: - curl - wget - python Existing cache policy: refresh_pattern -i .(7z\|deb\|rpm\|exe\|zip\|tar\|tgz\|gz\|ram\|rar\|bin\|tiff\|bz2\|run\|csv\|sh)$ 1440 80% 2880 It uses the standard squid refresh_pattern for cache requests. In our setup, we tried to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with last-modified factor 80% (squid doc). Please refer to pytorch/test-infra for details. Right now, it only applies to the build and test step, to limit the scope and make sure build and test are more reliable with egress cache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62244 Test Plan: ``` # first time, cache miss (4min20s) http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip 100 9680k 100 9680k 0 0 37836 0 0:04:21 0:04:21 --:--:-- 29908 # second time, cache hit (0s) http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip 100 9680k 100 9680k 0 0 103M 0 --:--:-- --:--:-- --:--:-- 103M ``` Load Test Plan: ``` # ab load test with `-n 100` requests ab -X $SQUID_PROXY -n 100 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Concurrency Level: 1 Time taken for tests: 9.044 seconds Complete requests: 100 Failed requests: 0 Total transferred: 991326300 bytes HTML transferred: 991242200 bytes Requests per second: 11.06 [#/sec] (mean) Time per request: 90.442 [ms] (mean) Time per request: 90.442 [ms] (mean, across all concurrent requests) Transfer rate: 107040.50 [Kbytes/sec] received ``` Reviewed By: malfet Differential Revision: D29928698 Pulled By: zhouzhuojie fbshipit-source-id: 4ee78be0abe35411666c6121991b0addded57106	2021-07-28 10:37:42 -07:00
Yi Wang	2581dfc249	[Model Averaging] Create a base class for model averaging (#62111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62111 This base class will be passed to the post-localSGD optimizer in the next PR. This way, the same post-localSGD optimizer can choose different model averaging algorithms. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134489187 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29884954 fbshipit-source-id: 1dc5e35c58895902991567f633afd621c7108938	2021-07-28 10:15:36 -07:00
Howard Huang	a15fff0a7f	Revert D29794666: Remove faulty process group code Test Plan: revert-hammer Differential Revision: D29794666 (`afe3644321`) Original commit changeset: 0b35191cc072 fbshipit-source-id: 6467bc5100f4115f2fdb385e205740cd68c89743	2021-07-28 10:15:34 -07:00
Thomas J. Fan	71a6ef17a5	ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206 Reviewed By: ejguan Differential Revision: D29942341 Pulled By: jbschlosser fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518	2021-07-28 10:15:32 -07:00
Jerry Zhang	cdf85a82ed	[quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215 including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29917524 fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923	2021-07-28 10:14:16 -07:00
leslie-fang-intel	7443c90f15	optimize non lastdim softmax bf16 (#60371 ) Summary: Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim. * Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template. * Release the bf16 limitation for backward calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371 Reviewed By: ejguan Differential Revision: D29563109 Pulled By: cpuhrsch fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e	2021-07-28 10:06:51 -07:00
Don Jang	68efa186cc	[static runtime] Implement aten::full (#62227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227 Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path. Reviewed By: hlu1 Differential Revision: D29923649 fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3	2021-07-28 09:50:27 -07:00
Rohan Varma	10c6811a6b	[DDP] Run test_ddp_new_tensor_in_fwd with static graph (#61992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61992 This test previously was not enabled for static graph but to ensure this feature is supported with DDPSink, enable it for static graph which currently passes outputs to DDPSink. ghstack-source-id: 134471406 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29830887 fbshipit-source-id: 2d3f750d9eb4289558ed21acccd172d83d9b82cc	2021-07-28 09:49:12 -07:00
Alban Desmaison	acf8907e94	These should be equivalent per the previous formula but breaks xla (#62329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62329 Reviewed By: ejguan Differential Revision: D29961527 Pulled By: albanD fbshipit-source-id: 46e46726591f4c0c8faf6ec0d7136a2d4ca976ea	2021-07-28 09:23:51 -07:00
Jerry Zhang	f4baa83eae	[bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942 This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29810656 fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267	2021-07-28 09:13:40 -07:00
Richard Zou	52d1ffb789	Teach pytrees about namedtuple (#62292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62292 This PR adds pytree support for namedtuples. The challenge about namedtuple is that each namedtuple class is actually different. This PR does the following: - it adds a namedtuple flatten/unflatten. The flatten function returns a context that is the actual type of the namedtuple subclass. The unflatten function uses that type to reconstruct the namedtuple - Special cases all pytree logic to consider all namedtuples the same. This is done by creating a `_get_node_type(pytree)` helper function that returns `namedtuple` if `pytree` is any namedtuple subclass. The effect of this is that all namedtuple subclasses will go through the namedtuple flatten/unflatten functions - Adds a `_namedtuple_flatten_spec` function for FX pytrees. This function flattens the namedtuple based on the spec and is equivalent to the `_tuple_flatten_spec`. Test Plan - new tests in test/test_pytree.py and test/test_fx.py Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29947302 Pulled By: zou3519 fbshipit-source-id: 19c00665b13546642c315df0f243ad99b8e7ff7c	2021-07-28 06:27:44 -07:00
Nikita Shulga	c06b6e445f	Build M1 binaries with PocketFFT (#62222 ) Summary: As MKL is only available on x86_64 platform, clone header-only PocketFFT library and use it as FFT provider Fixes https://github.com/pytorch/pytorch/issues/62107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62222 Reviewed By: ejguan Differential Revision: D29938718 Pulled By: malfet fbshipit-source-id: ac0bd98b5090d6c8a26c36c4e34a4d6e1d9f1a92	2021-07-27 22:41:29 -07:00
Nikita Shulga	cb2b5f06c9	Revert D29816592: [pytorch][PR] [fix] polygamma n>=1 Test Plan: revert-hammer Differential Revision: D29816592 (`b73d759708`) Original commit changeset: 2c020a6e4c32 fbshipit-source-id: 310c93ade300966366ef04f206a5908fb27745db	2021-07-27 22:14:10 -07:00
Amy He	73f1e2d1dc	[8/N] Nnapi backend delegation preprocess: New refactored design (#62225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225 Rewrote the preprocess function for Android NNAPI delegate. Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule). Dictionary returned contains: "shape_compute_module": torch::jit::Module, "ser_model": torch::Tensor, "weights": List[torch.Tensor], "inp_mem_fmts": List[int], "out_mem_fmts": List[int] Purpose and Future: The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient. Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well. nnapi_backend_preprocess.cpp: preprocess implementation prepare.py: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule Test: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully ghstack-source-id: 134444190 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully Reviewed By: raziel Differential Revision: D29922279 fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab	2021-07-27 18:52:48 -07:00
Nikita Shulga	7aabda6d5d	Update nccl to v2.10.3-1 (#62276 ) Summary: Which at the time of creating PR is points to `7e51592129` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62276 Reviewed By: ngimel Differential Revision: D29940950 Pulled By: malfet fbshipit-source-id: 59c6fda76a9023af3adbfb5a96b83ca50950df6c	2021-07-27 18:32:53 -07:00
Nikita Shulga	1f1d01df3e	Revert D29943356: .github: Migrate ecr_gc to github actions Test Plan: revert-hammer Differential Revision: D29943356 (`8e0622abf1`) Original commit changeset: 493592baf2f7 fbshipit-source-id: f0e604aab2b828561adc3e8fabf0f39221e15615	2021-07-27 18:14:31 -07:00
Wanchao Liang	af0f083d42	[dist_optim] fix the bug of none grads on functional optimizers (#62249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62249 parameter and grads passed to torch.optim.functional should always match, we should skip the parameters that have none gradients to avoid the size mismatch ghstack-source-id: 134452467 Test Plan: test_dist_optim_none_grads Reviewed By: mrshenli Differential Revision: D29929653 fbshipit-source-id: 4ca6167fecdfe1db422236655edee3aa59b8b044	2021-07-27 18:10:51 -07:00
Nikita Shulga	c0b806694f	Do not use deprecated data accessor in IndexKernel.cu (#62268 ) Summary: Fixes repeated warnings like: ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu: In lambda function: /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu:354:683: warning: 'T* at::Tensor::data() const [with T = c10::BFloat16]' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations] AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 (`e23ddf06e9`)(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, iter.dtype(), "take_cuda", [&] { ^ /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:559:1: note: declared here T * data() const { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62268 Reviewed By: walterddr Differential Revision: D29937267 Pulled By: malfet fbshipit-source-id: 6413deb9762b973880f4a7db47652eacd013214f	2021-07-27 17:58:19 -07:00
Christopher Dewan	e3be185069	[PyTorch] Add KWargs support to script module forward (#62224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62224 They underlying operator allows both args and kwargs, but we only expose args in this convenience method. this brings them in line while not changing any existing programs. Test Plan: CI Reviewed By: gunchu Differential Revision: D29920830 fbshipit-source-id: f4b2aa88d4a679e33595625b7ef355e4d14e54c4	2021-07-27 17:02:57 -07:00
Peter Bell	9776e1ff2f	Migrate thnn_conv_depthwise2d from THC to ATen (#62281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281 Closes gh-24646, Closes gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29943062 Pulled By: ngimel fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a	2021-07-27 16:51:23 -07:00
Alban Desmaison	ba9423aa93	Fix forward ad for matrix power land race (#62291 ) Summary: Fix land race from https://github.com/pytorch/pytorch/pull/59993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62291 Reviewed By: driazati, seemethere Differential Revision: D29946599 Pulled By: albanD fbshipit-source-id: 16411e1a0c298fad12a6a6788ec2427923b0112a	2021-07-27 16:17:51 -07:00
Peter Bell	171e13fde9	Rework PowKernel.cu (#62260 ) Summary: PowKernel.cu is the single slowest file to compile in all of pytorch, taking 7 m 34 s on my machine. After investigating, I discovered that the case with complex inputs and a cpu scalar for the first argument takes more than half that time just on its own. Noting that [`thrust::pow`] for complex is just `exp(log(base) * exponent)`, we can improve this kernel by precomputing `log(base)` on cpu and computing only the `exp` on CUDA. This is faster in both runtime and compile time. For 1 million elements, master takes 61.6 us vs 56.9 us with this PR. I also noticed that the constant exponent case is implemented twice, once in `gpu_kernel_with_scalars` and again in `pow_tensor_scalar_kernel`. Further, the `Pow.cpp` code detects cpu-scalar exponents and redispatches to the `tensor_scalar` overload, making the `gpu_kernel_with_scalars` version dead code. Now instead, we unconditionally run `tensor_tensor` and it will call into `tensor_scalar` if appropriate. With these changes, PowKernel.cu takes just 2 m 30 s to compile. [`thrust::pow`]: `368266e80e/thrust/detail/complex/cpow.h (L33)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62260 Reviewed By: ejguan Differential Revision: D29938789 Pulled By: ngimel fbshipit-source-id: 7ab7d81ececc92a9e6e62e60b0a4f2e6e3146df8	2021-07-27 16:16:20 -07:00
Jerry Zhang	7507aeded5	[reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892 ) (#62277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277 This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Imported from OSS Reviewed By: ejguan Differential Revision: D29941079 fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e	2021-07-27 15:46:44 -07:00
Jane Xu	24d94f5102	Limit smoke tests on PRs to just one config (#62288 ) Summary: When coming across the short runtime of a periodic job on this PR, I realized the current smoke tests on PRs set up was flawed. Previously an attempt for better future compatibility, our conditional for running smoke tests only was for USE_CUDA=1 on Windows. This is BAD and has unintended consequences, such as misleading results when a ci/scheduled workflow is triggered but fails to test the full test suite. e.g., with PR https://github.com/pytorch/pytorch/issues/62266 https://github.com/pytorch/pytorch/actions/runs/1071698069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62288 Reviewed By: seemethere, ejguan Differential Revision: D29945540 Pulled By: janeyx99 fbshipit-source-id: 3cc91511c151f7348872b039c94d7752b6ea4692	2021-07-27 15:33:37 -07:00
Eli Uriegas	8e0622abf1	.github: Migrate ecr_gc to github actions (#62284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62284 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D29943356 Pulled By: seemethere fbshipit-source-id: 493592baf2f7abe206e1fb17438bac4e908b1251	2021-07-27 15:11:01 -07:00
Eli Uriegas	d0e5ef5eba	.circleci: Remove conda-package-handling pin (#62290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62290 No longer needed anymore. Fixes nightly failures that we're observing as well: ``` Jul 27 07:33:02 Found conflicts! Looking for incompatible packages. Jul 27 07:33:02 This can take several minutes. Press CTRL-C to abort. Jul 27 07:33:02 failed Jul 27 07:33:02 Jul 27 07:33:02 UnsatisfiableError: The following specifications were found Jul 27 07:33:02 to be incompatible with the existing python installation in your environment: Jul 27 07:33:02 Jul 27 07:33:02 Specifications: Jul 27 07:33:02 Jul 27 07:33:02 - conda-package-handling=1.6.0 -> python[version='>=2.7,<2.8.0a0\|>=3.6,<3.7.0a0\|>=3.7,<3.8.0a0\|>=3.8,<3.9.0a0'] Jul 27 07:33:02 Jul 27 07:33:02 Your python: python=3.9 ``` From: https://app.circleci.com/pipelines/github/pytorch/pytorch/356478/workflows/2102acf1-c92a-4a59-919c-61d32d3bcd71/jobs/15027876 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29946501 Pulled By: seemethere fbshipit-source-id: 3e9182f4cbcf2aab185dbbc21b7a6171746e2281	2021-07-27 14:59:41 -07:00
Rong Rong (AI Infra)	8fe32c9c13	fix test-report uploading uniqueness issue (#62217 ) Summary: Should fix: https://github.com/pytorch/pytorch/issues/61978. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62217 Reviewed By: seemethere, ejguan Differential Revision: D29944444 Pulled By: walterddr fbshipit-source-id: 4b737d1535fd5cbfafb24245fad9ef67285f1dc0	2021-07-27 14:17:50 -07:00
Rong Rong (AI Infra)	190cdcb08c	remove print for status on scribe sending (#62285 ) Summary: Following up on https://github.com/pytorch/pytorch/issues/61768. Currently the printout is hugely long because each test case returns a status code OK without an exception. This should be avoided when no exception was raised from send_to_scribe. Removing the log printing when response without error Pull Request resolved: https://github.com/pytorch/pytorch/pull/62285 Reviewed By: zhouzhuojie Differential Revision: D29944461 Pulled By: walterddr fbshipit-source-id: fc3c2b88bba27c68521cef7079ca2b6197d2d58b	2021-07-27 14:16:32 -07:00
Mike Iovine	e1bee3eb30	[Static Runtime] Add missing unit tests for static runtime ops (#62238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238 Added tests for the following ops: * `aten::mul` * `aten::nan_to_num` * `aten::stack` * `aten::relu` * `aten::tanh` Reviewed By: hlu1 Differential Revision: D29914217 fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340	2021-07-27 14:12:21 -07:00
Sameer Deshmukh	4a15f4a902	Allow 0-dim batch sizes in Bilinear NN layer. (#47106 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/12013 Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106 Reviewed By: ejguan Differential Revision: D29935589 Pulled By: jbschlosser fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d	2021-07-27 13:59:42 -07:00
albanD	ab0354b650	All remaining linear/element-wise formulas (#59993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59993 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914594 Pulled By: albanD fbshipit-source-id: 2ffc5993cb66586e1458d7016774a03dfe786863	2021-07-27 13:06:46 -07:00
albanD	4c3eea26bd	Fix out= variant forward grad detection (#60499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60499 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914595 Pulled By: albanD fbshipit-source-id: c51bb3aed91ab1f6ebc57936143b249590a43bd5	2021-07-27 13:06:45 -07:00
albanD	4a36e2a223	Add forward AD inplace check and fix codegen (#60498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60498 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914593 Pulled By: albanD fbshipit-source-id: bde649d5a03639a240dfe5fe027c6a3f758428a4	2021-07-27 13:04:55 -07:00
Tanvir Zaman	df18d05429	Make bytes_read available for OperatorCost (#62059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059 GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost. Test Plan: Added the two additional pieces in the unit test testGetOperatorCost in workspace_test buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/... buck test //aiplatform/training/autotuning/tests/... buck test //aiplatform/training/pipelining/tests/... buck test //deeplearning/fblsim/tests/... Flow tests: ADP Greedy: f288078287 ADP MILP: f288079278 Reviewed By: CrazySherman, xtaofb Differential Revision: D29860676 fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03	2021-07-27 12:48:36 -07:00
JackCaoG	bba7800933	Add logical op symbol (#62063 ) Summary: This is for xla side [pr](https://github.com/pytorch/xla/pull/3054) to add logical op lowering Pull Request resolved: https://github.com/pytorch/pytorch/pull/62063 Reviewed By: ejguan Differential Revision: D29937449 Pulled By: bdhirsh fbshipit-source-id: ba421f6c2dad67395a383b5ed0b81ad9d59abe86	2021-07-27 12:19:56 -07:00
Laurence Rouesnel	3bdee2bbed	[jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326 ) (#61980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29917766 Pulled By: laurencer fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea	2021-07-27 11:50:20 -07:00
Eli Uriegas	fa52b4b922	.github: chown workspace for render_test_results (#62207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62207 Workspace was getting held back due to permission denied errors, let's ensure we have a chown'd / clean workspace for all render_test_results runs Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr, janeyx99 Differential Revision: D29915232 Pulled By: seemethere fbshipit-source-id: dd9fcc9c00d9665569bd8cfa57e5d2d8da965aac	2021-07-27 11:44:15 -07:00
Erjia Guan	acaac70f63	Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen Test Plan: revert-hammer Differential Revision: D29883676 (`de3a4eb583`) Original commit changeset: 9b2ac62cdd8a fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f	2021-07-27 11:28:52 -07:00
Pritam Damania	82d81455ae	[2/N] Remove unittest.skip across all of torch.distributed. (#61887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887 1) Introduced a `sandcastle_skip_if` decorator that ensures these tests just get passed on sandcastle. 2) Fixed all test files under `test/distributed` to not use `unittest.skip` Overall goal is to avoid using skips since sandcastle tags these tests as continuously skipping. ghstack-source-id: 134382237 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29784152 fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d	2021-07-27 10:53:23 -07:00
huqinghao	7fc96db45d	fix typo errors in quantization-support.rst Line320 (#44447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44379 change "`torch.per_channel_symmetric` — per tensor, symmetric" to "`torch.per_channel_symmetric` — per channel, symmetric" Pull Request resolved: https://github.com/pytorch/pytorch/pull/44447 Reviewed By: mruberry Differential Revision: D29909645 Pulled By: ezyang fbshipit-source-id: e1505d070ec2b335dd6503b528e6a2f3bda2f1e3	2021-07-27 10:42:29 -07:00
Edward Yang	5f7f08f498	Reenable AMP on XLA (#61861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61861 Fixes https://github.com/pytorch/pytorch/issues/61804 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29881903 Pulled By: ezyang fbshipit-source-id: 91530c10fa37715bec33f477285da119415a9da9	2021-07-27 10:32:01 -07:00
Oleg Khabinov	a0c1c7e5d4	Fixing the case when starter nodes depend on get_attr node (#62234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62234 There was a typo that we caught until recently, thus making this fix. Reviewed By: 842974287 Differential Revision: D29924190 fbshipit-source-id: ee6259fcd41358aefe9680b419acc87c0c2821cb	2021-07-27 10:29:53 -07:00
Erjia Guan	8cdf16d1de	Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module Test Plan: revert-hammer Differential Revision: D29810657 (`9df605133e`) Original commit changeset: 949615bbc017 fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc	2021-07-27 10:10:13 -07:00
Nikita Vedeneev	d7ddae8e4f	det_backward: correct, more robust and with complex support [clone] (#61905 ) Summary: Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905 Reviewed By: albanD Differential Revision: D29937920 Pulled By: anjali411 fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f	2021-07-27 10:08:26 -07:00
Peter Bell	de3a4eb583	Migrate thnn_conv_depthwise2d from THC to ATen (#62006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006 Closes gh-24646, gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29883676 Pulled By: ngimel fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8	2021-07-27 10:00:25 -07:00
Jerry Zhang	9df605133e	[bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892 This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29810657 fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19	2021-07-27 09:49:20 -07:00
Amy He	6c6a9c73f2	[7/N] Nnapi backend delegation preprocess: compile_spec sanity check (#62213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213 Added sanity checks in preprocess function for Android NNAPI delegate. `preprocess()` requires some input metadata passed through its `method_compile_spec` function argument. `preprocess()` now throws specific error messages, if it cannot find the correct input arguments. Example error message: ``` RuntimeError: method_compile_spec does not contain the "forward" key. method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder. For input shapes, use 0 for run/load time flexible input. method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}} ``` nnapi_backend_preprocess.cpp: contains sanity check implementation test_backend_nnapi.py: sanity check unit tests Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully. TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message. ghstack-source-id: 134339282 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully. Reviewed By: raziel, iseeyuan Differential Revision: D29917004 fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4	2021-07-27 09:31:35 -07:00
Rohan Varma	2cbc0ede7d	[DDP] Log if graph is static at end of training (#61871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61871 When set_static_graph=False, the only type of dynamism we really support in DDP is dynamic set of unused parameters which must be explicitly enabled with find_unused_parameters=True. Although, some workflows have static set of unused parameters, would be good to detect and add this to logging to identify workflows that are candidates for static graph optimization. ghstack-source-id: 134371429 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29773962 fbshipit-source-id: 1f741984c6e6f8e3e55cf69ca719b1e25a485b13	2021-07-27 09:23:43 -07:00
Mike Iovine	79eb8bb299	[Static Runtime] Enforce proper output dtype for many ops (re-land) (#62267 ) Summary: Re-land of D29935444 We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267 Reviewed By: ejguan Differential Revision: D29937253 Pulled By: malfet fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967	2021-07-27 08:54:09 -07:00
Brian Vaughan	2eef1f27f8	Disable ccache for nccl builds (#62208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62208 reverts https://github.com/pytorch/pytorch/pull/55814 which removed a workaround for: https://github.com/pytorch/pytorch/issues/13362 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29935472 Pulled By: nairbv fbshipit-source-id: 7ce9cde1408f17153632036fd128814032739746	2021-07-27 08:07:26 -07:00
Erjia Guan	dc55d511d9	Forward fix mypy (#62263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62263 Fixes current HUD Error: https://github.com/pytorch/pytorch/runs/3170342799 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29935265 Pulled By: ejguan fbshipit-source-id: 6f247833d24bff7aea42f6287493a85d62d73b96	2021-07-27 07:52:31 -07:00
Ivan Yashchuk	3cd12448b4	Add forward mode differentiation for inverse and solve (#62160 ) Summary: This PR adds forward mode differentiation for `torch.linalg.inv`, `torch.linalg.inv_ex`, and `torch.linalg.solve` functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62160 Reviewed By: mruberry Differential Revision: D29917213 Pulled By: albanD fbshipit-source-id: b08bbc830f77f342cc7ca5b823d7ea4380f2aaa8	2021-07-27 07:51:22 -07:00
Joel Schlosser	a0309f89f4	Initial ModuleInfo implementation (#61935 ) Summary: This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following: * (new file) `torch/testing/_internal/common_modules.py` * `ModuleInfo` definition - metadata for each module to use in testing * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?) * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over * Some constants used to keep track of all modules under torch.nn: * `MODULE_NAMESPACES` - list of all namespaces containing modules * `MODULE_CLASSES` - list of all module class objects * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear") * (new file) `test/test_modules.py` * Uses the above to define tests over modules * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935 Reviewed By: mruberry Differential Revision: D29881832 Pulled By: jbschlosser fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f	2021-07-27 07:42:07 -07:00
Howard Huang	afe3644321	Remove faulty process group code (#61907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61907 Removing the code for faulty process group agent since it was replaced by faulty tensorpipe agent Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29794666 Pulled By: H-Huang fbshipit-source-id: 0b35191cc07220b6774ecacc8d004f25fd2e87f0	2021-07-27 07:37:40 -07:00
Erjia Guan	a3be2ecc3a	Revert D29887367: [Static Runtime] Enforce proper output dtype for many ops Test Plan: revert-hammer Differential Revision: D29887367 (`f4136c5efc`) Original commit changeset: cef04bfa52ec fbshipit-source-id: 32e89f2b6381930559dd746b535904c3e90fd52b	2021-07-27 07:29:09 -07:00
lezcano	b599c1e794	Create linalg and parametrizations codeowners (#62086 ) Summary: Added myself nikitaved and IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/62086 Reviewed By: mruberry Differential Revision: D29920798 Pulled By: albanD fbshipit-source-id: dcbd57bb2a438a1f04d4651447710fced83264d3	2021-07-27 06:50:41 -07:00
CodemodService FBSourceClangFormatLinterBot	228b50e053	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29930232 fbshipit-source-id: e36dbc59a25d7f36d3bb7a02ad76696f299712cf	2021-07-27 04:13:15 -07:00
Jerry Zhang	2d7c1e3fa8	[bc-breaking] Produce quantization pattern for add_scalar and mul_scalar (#61859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859 BC-breakign note: Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation, in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the output quantized tensor have the same quantization parameter as input) Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_add python test/test_quantization.py TestQuantizeFxOps.test_mul Imported from OSS Reviewed By: vkuzo Differential Revision: D29770859 fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25	2021-07-27 02:46:00 -07:00
Alex Suhan	b176feec1e	Add device and key for lazy tensors (#61621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621 Test Plan: CI Reviewed By: mruberry Differential Revision: D29912934 Pulled By: asuhan fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537	2021-07-26 23:00:22 -07:00
Nikita Shulga	2945a73d90	Add option to skip GH validation for torch.hub (#62139 ) Summary: Split from https://github.com/pytorch/pytorch/pull/62072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62139 Reviewed By: mthrok Differential Revision: D29891497 Pulled By: malfet fbshipit-source-id: 5c0baf53a2acf8f95062bd001457e1f936011529	2021-07-26 22:44:12 -07:00
Rohan Varma	64283fe146	[DDP/Functional Optim] Support kwarg arguments (#62079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62079 Adds support for kwarg arguments into functional optimizer running as hook. ghstack-source-id: 134330379 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29838127 fbshipit-source-id: 2ab051ef5f0dff19c145ebe2260668b927ba47b2	2021-07-26 22:12:50 -07:00
Rohan Varma	c0ebeca1a8	[Functional Optim] Test kwargs parity for SGD (#62078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078 Ensure that kwarg arguments such as momentum and weight decay maintain parity between optimizer.step and step_param. ghstack-source-id: 134330377 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29837942 fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60	2021-07-26 22:11:40 -07:00
Nikita Shulga	478098aaac	Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_. Test Plan: revert-hammer Differential Revision: D29801652 (`29bb3f4647`) Original commit changeset: bb01eb1acf3d fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897	2021-07-26 19:37:17 -07:00
Rohan Varma	69adb21940	Parity tests for functional optimizer step_param (#61756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756 DDP will support running optimizer as communication hook with optimizers that support a per-parameter/gradient step function `step_param`. Add parity tests as we implement more optimizers that support step_param to ensure parity with regular optimizers. ghstack-source-id: 134330378 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D29727549 fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f	2021-07-26 19:03:22 -07:00
Nikita Shulga	b6d10a3a27	Fix infinite loop in `_validate_not_a_forked_repo()` (#62072 ) Summary: Increase `page_idx` in the loop rather than outside of it Break from the loop when receive empty response as it means there are no more items to fetch via pagination request Also, add options to use provided github token (via `GITHUB_TOKEN` environment variable) Fixes failure with "Rate Limit Exceeded" when doing something like `torch.hub.list("pytorch/test-infra:dsf")` Fixes https://github.com/pytorch/pytorch/issues/61755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62072 Reviewed By: jbschlosser Differential Revision: D29868539 Pulled By: malfet fbshipit-source-id: 206082a0ba1208e9b15ff6c9c6cb71d2da74f1c3	2021-07-26 17:54:07 -07:00
Pavithran Ramachandran	d0f430927b	[PyTorch][Edge] Serializing sub modules with same names (#61933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933 ### Issue: SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name. ### Fix: Mangler creates unique names so that modules and submodules that have same names can be uniquely identified while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224 ### Diff: The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl` Will this have backward compatibility? iseeyuan please feel free to correct or update this. Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped. ghstack-source-id: 134242696 Test Plan: ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated Total time: 19.5 sec More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534 Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ``` ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated Total time: 5.0 sec More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ``` To check the `bytecode.pkl` using module inspector please check: N1007089 Reviewed By: iseeyuan Differential Revision: D29669831 fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647	2021-07-26 16:31:48 -07:00
mattip	a13f714b6d	DOC: remove git stamp from release documentation version (#58486 ) Summary: CI built the documentation for the recent 1.9.0rc1 tag, but left the git version in the `version`, so (as of now) going to https://pytorch.org/docs/1.9.0/index.html and looking at the version in the upper-left corner shows "1.9.0a0+git5f0bbb3" not "1.9.0". This PR should change that to cut off everything after and including the "a". It should be cherry-picked to the release/1.9 branch so that the next rc will override the current documentation with a "cleaner" version. brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/58486 Reviewed By: zou3519 Differential Revision: D28640476 Pulled By: malfet fbshipit-source-id: 9fd1063f4a2bc90fa8c1d12666e8c0de3d324b5c	2021-07-26 16:28:59 -07:00
Raghavan Raman	60070982d2	[Static Runtime] Fixed build failure in OSS due to test_utils (#62216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62216 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29917514 Pulled By: navahgar fbshipit-source-id: 379863e6cd0b157de3bfa1482f5519b26654b3d2	2021-07-26 16:10:10 -07:00
Janet Yang	962841b532	Fix subnet counting and re-enable check for multiple onnxifi ops in AOT (#62033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62033 Count the number of onnxifi ops rather than just number of subnets, since when the subnet size < min_ops, it isn't turned into an onnxifi op. Test Plan: Runs which ran into the "Did not find a partition with an SLS node" error now report "multiple onnxifi ops found" From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/: ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-06-30/onnxifi_caffe2_net_aot_input_arguments_01-55-32_711d9476?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` Reran some failures from last week which now pass AOT: From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/, https://fb.workplace.com/groups/243933520351820/permalink/572715897473579/ ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-09/onnxifi_caffe2_net_aot_input_arguments_05-31-08_ef5393a6?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-12/onnxifi_caffe2_net_aot_input_arguments_14-44-34_cfdf3053?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-13/onnxifi_caffe2_net_aot_input_arguments_04-03-30_162e7e53?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` Reviewed By: khabinov Differential Revision: D29796893 fbshipit-source-id: e9de7529ef86745207d41643d0fbe932fa166437	2021-07-26 16:08:51 -07:00
Shiyan Deng	037c4aa1d1	[fx2trt] flatten converter (#62202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62202 Add acc_ops.flatten converter. Also migrate to oss acc tacer for trt interpreter. Test Plan: unit test Reviewed By: khabinov Differential Revision: D29861555 fbshipit-source-id: dac88a703fdbf386f3f7fb27674a67951f3add49	2021-07-26 15:49:01 -07:00
Richard Barnes	f883ed9095	irange-ify 8b (#62195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62195 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29887946 fbshipit-source-id: e3bd44721cf06a34ced47994810212be8460a2bb	2021-07-26 15:38:54 -07:00
Richard Barnes	f7743e92bf	irange-ify 9 (#62118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62118 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879670 fbshipit-source-id: 99b86ac7d65dfa2a47d0e6b7d65433200d18081e	2021-07-26 15:13:02 -07:00
Kimish Patel	026cfe85b4	Fix InlinedCallStack annotation to account for module calling its own (#61791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791 methods from forward During inlining we attached InlinedCallstack to nodes being inlined. In the process we attach moodule information as well, such that if CallMethod is being inlined we know which class instance and class type the method belongs to. However, CallMethod can be calling a method of the same object to which the graph belongs. e.g.: ``` def forward(self, input): x = input + 10 return forward_impl_(x, input) ``` Here forward_impl is method defined on the same class in which forward is defined. Existing module hierarchy annotation will mislabel this as unknown instance since the method is not associated with output of GetAttr node (it would be we had called self.conv.forward_impl_ for example). Change in this PR reconciles this by creating a placeholder name "SELF" for module instance indicating that you can traverse InlinedCallStack backwards to find first node with name != SELF, which would be the name of the object. e.g.: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward Test Plan: Add test Imported from OSS Reviewed By: larryliu0820 Differential Revision: D29745443 fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93	2021-07-26 15:00:57 -07:00
Nikita Shulga	f16102f72a	Revert D29892919: Add squid proxy as egress cache Test Plan: revert-hammer Differential Revision: D29892919 (`e63160d735`) Original commit changeset: ac17227f2553 fbshipit-source-id: b78313147d60f26c1df68a25293e6b571ba66919	2021-07-26 14:42:28 -07:00
Edward Yang	cf1f59452b	Hacky support for meta tensor serialization. (#62192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62192 This support is hacky because it doesn't preserve meta tensor storage sharing (e.g., if you serialize a model with shared storage, e.g., a tensor and a view on a tensor, when I deserialize the viewing relationship will be broken and these are just different tensors.) The hack is also durable, in the sense that we will be on the hook for supporting `_rebuild_meta_tensor_no_storage` in perpetuity in the future, even if we change our mind about the serialization format. This unblocks an FB production use case. I didn't add C++ support to minimize blast area of this patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29910535 Pulled By: ezyang fbshipit-source-id: d98dcdd0108dfc3ae730a071d3c583b6d0281d21	2021-07-26 14:33:45 -07:00
Elton Leander Pinto	f0140a8c5f	Disable cppcoreguidelines-non-private-member-variables-in-classes (#62212 ) Summary: This PR disables the `cppcoreguidelines-non-private-member-variables-in-classes` check. PyTorch makes use of `protected` members throughout the codebase, and we do not want to perform this clang-tidy check in CI to improve signal-to-noise. Relevant failure: https://github.com/pytorch/pytorch/pull/61871/checks?check_run_id=3146453417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62212 Reviewed By: driazati Differential Revision: D29917882 Pulled By: 1ntEgr8 fbshipit-source-id: f607c3d050a122e95136f9915060c4cda6694c9d	2021-07-26 14:14:05 -07:00
Elton Leander Pinto	1343eea037	Fix clang-tidy line filtering logic (#62210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62210 Fixes #62204 Test Plan: #62211 clang-tidy should only error on the added lines (and not on context/removals) Reviewed By: driazati Differential Revision: D29917897 Pulled By: 1ntEgr8 fbshipit-source-id: de91dbf34c1ad8405507cad91ab3dd0d6c61d82e	2021-07-26 14:12:53 -07:00
Elton Leander Pinto	2a83f24027	Enable macos clang-tidy installs (#62214 ) Summary: This PR enables installing our custom MacOS clang-tidy binaries. It also updates related documentation. The binaries are produced by [this CI job](https://github.com/pytorch/test-infra/blob/master/.github/workflows/clang-tidy-macos.yml), and are published to S3. This PR does not handle versioning of the downloaded binaries as this is being worked on separately. See https://github.com/pytorch/test-infra/issues/73 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62214 Test Plan: On a MacOS machine, run ```bash python3 -m tools.linter.install.clang_tidy .clang-tidy-bin/clang-tidy --checks="*" --list-checks \| grep "misc-max-tokens" ``` Reviewed By: janeyx99, mruberry Differential Revision: D29917728 Pulled By: 1ntEgr8 fbshipit-source-id: 98d0d8b7a57bdebf0ebcdc83228ef391e8c6629e	2021-07-26 13:43:29 -07:00
Mike Iovine	f4136c5efc	[Static Runtime] Enforce proper output dtype for many ops Summary: We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29887367 fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c	2021-07-26 13:27:02 -07:00
Richard Zou	29bb3f4647	Refactor Tensor::to to call a primitive that is not copy_. (#61458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29801652 Pulled By: zou3519 fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351	2021-07-26 13:02:39 -07:00
zhouzhuojie	e63160d735	Add squid proxy as egress cache (#62103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62103 This PR adds a squid proxy that's deployed dedicated for PyTorch CI. Initially we only roll out to GHA, and if things are ok we will extend this to circleci tests if necessary. `http_proxy` and `https_proxy` are compatible with the following http clients: - curl - wget - python Existing cache policy: ``` refresh_pattern -i .(7z\|deb\|rpm\|exe\|zip\|tar\|tgz\|gz\|ram\|rar\|bin\|tiff\|bz2\|run\|csv\|sh)$ 1440 80% 2880 ``` It uses the standard squid refresh_pattern for cache requests. In our setup, we tried to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with last-modified factor 80% ([squid doc](http://www.squid-cache.org/Doc/config/refresh_pattern/)). Please refer to [pytorch/test-infra](https://github.com/pytorch/test-infra/tree/master/aws/websites/squid-proxy) for details. Right now, it only applies to the `build` and `test` step, to limit the scope and make sure build and test are more reliable with egress cache. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet, seemethere, janeyx99 Differential Revision: D29892919 Pulled By: zhouzhuojie fbshipit-source-id: ac17227f2553ca62881711b3e9943488dfd8defd	2021-07-26 13:01:34 -07:00
Richard Barnes	d2594fa538	irange-ify 3 (#62112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62112 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879513 fbshipit-source-id: c01d18d34bb19014bf28d92c4d04b07e50a2770a	2021-07-26 12:56:58 -07:00
Salil Desai	f5c6c3947e	Remove Input Pointer Caching for XNNPack (#61959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61959 We no longer need to cache the Input Pointer as XNNPACK has implemented a more robust approach where indirection buffer does not need to be recalculated even if activation tensor pointer changes, as long as tensor dimensions are the same. This reverses the changes in https://github.com/pytorch/pytorch/pull/42840/files Reviewed By: kimishpatel Differential Revision: D29777605 fbshipit-source-id: c1750538c17bce34f885c6f1bbb1f7164ebba25b	2021-07-26 12:02:15 -07:00
Richard Barnes	7ec6d1e857	irange-ify 2 (#62113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62113 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879507 fbshipit-source-id: 1fb114e44afe8c1407f648b705db7fd4edb9d6e3	2021-07-26 12:00:52 -07:00
Rohan Varma	6dc2c07304	[Reland] [DDP] Implement a hook which performs FunctionalSGD step. (#62177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62177 Reland of https://github.com/pytorch/pytorch/pull/61678 Fix CI failure by gating including torchvision model on whether torchvision is available or not. ghstack-source-id: 134282165 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29904101 fbshipit-source-id: 47e799eb4a90acbbda91c5857ea00de3045d49f5	2021-07-26 11:56:56 -07:00
Jamie King	1dfb687f3c	Fixed off-by-one bug in Adam Smart Decay (#62135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135 The initial implementation of Adam with Smart Decay had an off-by-one error. This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches. The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small. The impact of the bug was proportional to 1/k. The testing strategy has also been adjusted to cover this bug. Differential Revision: D29889309 fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6	2021-07-26 11:55:38 -07:00
Supriya Rao	dcb3eadc1f	[quant][fix] Update quantization c++ tests to not run if CPU_STATIC_DISPATCH is specified (#62197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62197 For build configs with ATEN_CPU_STATIC_DISPATCH defined, quantization tests will fail since they require QuantizedCPU dispatch to be enabled. This will fix some internal test failures like https://www.internalfb.com/intern/test/844424941811803?ref_report_id=0 which are run under the `caffe2_aten_cpu_inference` project Test Plan: buck test mode/dev //caffe2/aten:quantized_test Imported from OSS Reviewed By: bdhirsh Differential Revision: D29912742 fbshipit-source-id: b117eb9f4afb51e0d0dd52fbe9d5c5be7dfafe85	2021-07-26 11:39:45 -07:00
Richard Barnes	0ca5dc7f03	irange-ify 5 (#62114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62114 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879534 fbshipit-source-id: 0b1d6d2c9062a2fd7a55b00cb9f3d59ec941bad3	2021-07-26 11:07:54 -07:00
Akshit Khurana	8e71f48f0a	Handle simple NNAPI flatten NHWC case (#61796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796 We can easily handle nnapi conversion for nhwc inputs that have 1 channel or H & W are 1 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Imported from OSS Reviewed By: saketh-are Differential Revision: D29827735 fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14	2021-07-26 10:59:04 -07:00
kshitij12345	b73d759708	[fix] polygamma n>=1 (#61641 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/55357 TODO: * [x] Use proper casting to avoid confusing the compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/61641 Reviewed By: albanD Differential Revision: D29816592 Pulled By: mruberry fbshipit-source-id: 2c020a6e4c325c1b5d15499a77fb39f9ba93dd79	2021-07-26 10:52:20 -07:00
Pritam Damania	ef7d572afa	Ensure ShardedTensor handles list/tuple appropriately as `size` parameter. (#62109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62109 The `size` parameter only worked correctly for *args like invocation :10, 20 and not for list: [10, 20] and tuples: (10, 20). This PR ensures this works similar to `torch.empty`. ghstack-source-id: 134246166 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29884768 fbshipit-source-id: 7a4a3c5ed5d7c081344f6ead3170905b97fc652d	2021-07-26 10:31:32 -07:00
Richard Barnes	f9dce598a5	Add some missing cuda guards (#62100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62100 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29880330 fbshipit-source-id: 7089000ccbcaa70a13f0ab4531b032bd5326e539	2021-07-26 10:26:22 -07:00
Victor Quach	200b6ccdc0	Catch saved tensors default hooks race condition (#61957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61957 If the user runs code that registers default saved tensor hooks from multiple threads, it will fail with a nice error message most of the time. This commit handles the very rare case where a race condition would have made it fail silently. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29848525 Pulled By: Varal7 fbshipit-source-id: eb9bdcfbeed857a988834651246390ea14eedd33	2021-07-26 09:48:47 -07:00
Neel Pragnesh Gandhi	f2369f12f9	Add logging for dynamic rendezvous (#61822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61822 Added scuba logging to the following files: - dynamic_rendezvous.py - c10d_rendezvous_backend.py NOTE: This diff introduces the use of python's inspect module to easily allow for obtaining the calling method name and filename when logging. This module can mess with python's garbage collector, so special care was taken to never store references to results from inspect.stack() longer than absolutely needed. Test Plan: The following tests can be run. ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:c10d_rendezvous_backend_test ``` ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:dynamic_rendezvous_test ``` ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/events:lib_test ``` Reviewed By: aivanou Differential Revision: D29643774 fbshipit-source-id: f10cd5ebf8f6860856267bc2483c0b85faacb0fd	2021-07-26 09:39:09 -07:00
Mike Iovine	6007ad3529	[Static Runtime] Refactor fb op tests to use testStaticRuntime (#62064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62064 `testStaticRuntime` was previously only available in `test_static_runtime.cc`. It has been moved to a common library `test_utils` to facilitate code re-use. This also lets us test dynamic shapes in `test_fb_operators` Reviewed By: hlu1 Differential Revision: D29858928 fbshipit-source-id: 68a94760166ddb745972b0f1fc24bed594937d1c	2021-07-26 08:25:10 -07:00
Victor Quach	be17d6eadf	Add default Saved Variable hooks (#61834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61834 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29792193 Pulled By: Varal7 fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c	2021-07-26 08:14:32 -07:00
Thomas J. Fan	89ca638c18	ENH Adds no batch dim support for AdativeMaxPool*D (#61847 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61847 Reviewed By: suo Differential Revision: D29883887 Pulled By: jbschlosser fbshipit-source-id: de3fcf1cc3878b138ab766d2a50cc59c52ec5a60	2021-07-26 07:35:36 -07:00
CodemodService FBSourceClangFormatLinterBot	394dd391dd	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29904940 fbshipit-source-id: 16ce87cc328f2950ed95a12710b50c444e363c79	2021-07-26 03:41:55 -07:00
Hui Guo	e6e8745bea	[nnc] Add simplifierUnderContext for simplification that needs context info: currently added for-stmt index var bounds info as context (#60687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60687 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29373315 Pulled By: huiguoo fbshipit-source-id: 8729af60dd6d9735187b2118e3e83c75ef21789d	2021-07-25 23:30:13 -07:00
Rohan Varma	2299d6a013	Revert D29701447: [DDP] Implement a hook which performs FunctionalSGD step. Test Plan: revert-hammer Differential Revision: D29701447 (`bd95cf4473`) Original commit changeset: 183954593b82 fbshipit-source-id: 714e6a2b698147db9533a67783aed2a65d9d5bfe	2021-07-25 22:23:30 -07:00
Jerry Zhang	457a3fb6d1	[bc-breaking][quant][graphmode][fx] Produce dequant - fp_op - quant pattern for copy nodes (#61763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61763 This PR changes the is_reference=True option for convert_fx to produce a dequant - fp_op - quant pattern for copy nodes like maxpool op. Before the PR: ``` def forward(self, x): maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0 maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8); x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None maxpool2d = self.maxpool2d(quantize_per_tensor); quantize_per_tensor = None dequantize = maxpool2d.dequantize(); maxpool2d = None return dequantize ``` After (we expand the maxpool2d that works with quantized input to "dequant - maxpool2d - quant" pattern ``` def forward(self, x): maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0 maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8); x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None dequantize = quantize_per_tensor.dequantize(); quantize_per_tensor = None maxpool2d = self.maxpool2d(dequantize); dequantize = None maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0 maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0 quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8); maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None dequantize_1 = quantize_per_tensor_1.dequantize(); quantize_per_tensor_1 = None return dequantize_1 ``` note that the call to self.maxpool2d is expanded to ``` dequantize = quantize_per_tensor.dequantize(); quantize_per_tensor = None maxpool2d = self.maxpool2d(dequantize); dequantize = None maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0 maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0 quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8); maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None ``` Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_copy_node_has_shared_actpp_instance ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29728900 fbshipit-source-id: cf2c7f1f6659e3ba97cbb7c6204dd13983da10bd	2021-07-25 19:49:13 -07:00
Supriya Rao	76d3cdf9df	[quant] Add from_blob_quantized_per_channel API (#62049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62049 Adds a new function that accepts qint data blobs as input and creates a per-channel quantized tensor using the pre-allocated data and the provided scale and zero_point inputs Addresses issue #61777 Test Plan: ./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerChannel' Imported from OSS Reviewed By: kimishpatel Differential Revision: D29854136 fbshipit-source-id: da6ecd3fb59a6f40ae88430fdd5d895f93d5411c	2021-07-25 14:09:38 -07:00
Supriya Rao	7195b78a59	[quant] Add from_blob_quantized_per_tensor API (#61986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61986 Adds a new function that accepts qint data blobs as input and creates a quantized tensor using the pre-allocated data and the provided scale and zero_point inputs Addresses issue https://github.com/pytorch/pytorch/issues/61777 Test Plan: ./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerTensor' Imported from OSS Reviewed By: kimishpatel Differential Revision: D29831135 fbshipit-source-id: b08299bbe9e939fedff98a585e6b12c14d31f17e	2021-07-25 14:08:25 -07:00
Rohan Varma	bd95cf4473	[DDP] Implement a hook which performs FunctionalSGD step. (#61678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61678 This diff makes the following changes: - Add `step_param` method to `_FunctionalSGD` class which is written similar to `step` but for a single param - Implement a communication hook wrapper that runs a given comm. hook and then applies functional SGD step - Verifies that this is equal to regular allreduce + SGD optimizerghstack-source-id: 133567598 ghstack-source-id: 134263399 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29701447 fbshipit-source-id: 183954593b82a092414623292f9b10e675fef96e	2021-07-25 13:36:47 -07:00
tktrungna	8152433de2	[1/n] Update testing lib.so path (#61960 ) Summary: ### Issue Build PyTorch wheel packages during build stage for pull requests and install during test stage. ### Fix Update all tests which call lib.so (under `./build folder`), change to call lib*.so in `{ent}/pytorch/lib/python3.8/site-packages/torch` ### Diff This diff starts to update test_fx, test_backend and test_torchbind first to check if current ci pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/61960 Test Plan: check of all ci workflows pass Reviewed By: malfet, saketh-are Differential Revision: D29823235 Pulled By: tktrungna fbshipit-source-id: e7f652def698e303d4843fbaedf4859f5eca2fd9	2021-07-24 05:16:35 -07:00
Nikolay Korovaiko	956f1c981e	fix a typo (#61061 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61061 Reviewed By: navahgar, Gamrix Differential Revision: D29495806 Pulled By: Krovatkin fbshipit-source-id: 510de724e3108c52af1b25b8ab53ae3c895b55f9	2021-07-24 00:35:58 -07:00
Richard Barnes	ee44d73e59	Modernize override (#61744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717320 fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc	2021-07-23 23:04:46 -07:00
Shiyan Deng	d2e03dc484	[fx2trt] Add support for explicit batch dimension (#62110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62110 Add an option to opt in explicit batch dimension. Extend unit tests to test both scenario (implicit and explicit). Fixed some converters that doesn't work with explicit batch dimension before. Add broadcast support and a generic function for adding elementwise binary ops. Follow ups: 1. Adding the dynamic shape support in explicit batch dimension mode to allow different batch dimension at least. 2. Extend layer_norm plugin `PluginV2Ext` to make it work in explicit batch dimension. Test Plan: unit tests Reviewed By: jackm321 Differential Revision: D29798239 fbshipit-source-id: 91d47c6155d2473ed4a6f8d2816715a32c61b869	2021-07-23 22:54:07 -07:00
Jerry Zhang	cc263ef795	[bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687 Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool). But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize node. Model: ``` class M(torch.nn.Module): def __init__(self): super().__init__() self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3) def forward(self, x): x = self.maxpool2d(x) return x ``` result of prepare: Before: def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None return maxpool2d After: def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d); maxpool2d = None return maxpool2d_activation_post_process_0 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29715566 fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df	2021-07-23 21:29:37 -07:00
Hao Lu	78f7d8ccfa	[Static Runtime] Remove wrappers for aten::cat (#62067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067 The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (`ae58a4c45d`) . Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1. Reviewed By: navahgar, mikeiovine Differential Revision: D29864600 fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2	2021-07-23 20:33:41 -07:00
Zachary DeVito	7c09de8384	[torch deploy] add support for Python C extension modules (#58117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58117 Previously it was not possible to load C extension modules with deploy because extension modules need to link against the Python.h API functions. Since each libtorchdeploy_interpreter.so had its own copy of these functions, it is not possible to tell dlopen to resolve symbols in a loaded SO from one of these libraries without exposing its symbols globally. This patch adds a custom ELF loader which does the custom loading of attaching c extension libraries to the Python API that loaded the shared library. Simple use of numpy and regex modules appears to work. This diff has some limitations: * 64-bit Linux only. OSX and windows use different formats for shared libraries. 32-bit ELF files are not supported. * debug info is not immediately availiable to debuggers. A script for lldb is provided which can be loaded so that lldb knows about the libraries as they are loaded. * shared libraries can directly use the Python API, but libraries they depend on (via DT_NEEDED entries in their dynamic segment) may not use Python. In the future, we can try to detect whether a sub library uses the Python API and load it with our customer loader. * TLS initialization and library initialization may occur in a different order than what would happen with dlopen, potentially leading to some issues running destructors in TLS segments. Use of this C++ features is relatively rare. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28435305 Pulled By: zdevito fbshipit-source-id: 10f046053dd1d250e3c73f2cce8eb945eeba31b6	2021-07-23 19:58:54 -07:00
Yi Wang	e856a45283	[Model Averaging] Refactor averagers to accept parameters instead of a module (#62105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62105 This is for the preparation of wrapping the averager as an optimizer, which can only accept parameters rather than a module. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134213572 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters Reviewed By: rohan-varma Differential Revision: D29883693 fbshipit-source-id: 474ba924a0b05068b12f163fb74582bccf314964	2021-07-23 18:39:45 -07:00
Ilia Cherniavskii	41f7a9dac0	[profiler][refactor] Avoid using legacy event in profiler (#61721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61721 Remove dependency on LegacyEvent from the profiler Test Plan: python test/test_profiler.py -v Imported from OSS Reviewed By: kimishpatel, gdankel Differential Revision: D29716769 fbshipit-source-id: 2c2b48f2ee096adcbde09821e0cc7c0fcb94d19f	2021-07-23 18:28:08 -07:00
Ivan Kobzarev	06a3b23971	[android] Lite interpreter module to load from assets (#61609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61609 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D29688641 Pulled By: IvanKobzarev fbshipit-source-id: 7857bad51e91eae7c90a1218d463f3767f4fae15	2021-07-23 17:51:18 -07:00
Hui Guo	643e58466e	[nnc] Rename IRSimplifierBase with PolynomialBase (#60686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60686 Test Plan: Imported from OSS Reviewed By: navahgar, soulitzer Differential Revision: D29373316 Pulled By: huiguoo fbshipit-source-id: bd44bff60455076d1c5291273989e9939a428f9a	2021-07-23 17:18:41 -07:00
Amy He	046272f3e5	[6/N] Nnapi Backend Delegate: Comprehensive OSS Tests (#61782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61782 This PR depends on https://github.com/pytorch/pytorch/pull/61787 ### Summary: Added more comprehensive tests for Android NNAPI delegate. Previously, there was only one basic test for lowering a PReLU module with the NNAPI delegate. Now, more tests are inherited from `test_nnapi.py`, the file for testing NNAPI conversion and execution without the delegate. test_backend_nnapi.py Test file for Android NNAPI delegate. - `TestNnapiBackend` class inherits tests from `test_nnapi.py` and overrides the model conversion to use the delegate API. - Includes an extra test for passing input arguments as Tensors and Tensor Lists. - Has extra set up for loading the NNAPI delegate library and changing the default dtype from float64 to float32 (dtype is typically float32 by default, but not in delegate backend unit tests) test_nnapi.py Test file for Android NNAPI without the delegate. - Some code was refactored to allow override of only the NNAPI conversion call. - An extra function was added to allow the NNAPI delegate unit test to turn off the model execution step. Once the NNAPI delegate's execution implementation is complete, this may no longer be necessary. ### Test Plan: I ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` to run both test files. Test Plan: Imported from OSS Reviewed By: raziel, iseeyuan Differential Revision: D29772005 fbshipit-source-id: 5d14067a4f6081835699b87a2ece5bd6bed00c6b	2021-07-23 17:04:07 -07:00
Thomas J. Fan	f03e7170f0	ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR does not use `check_sum_reduction` because I wanted to test every reduction option. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461 Reviewed By: suo Differential Revision: D29883744 Pulled By: jbschlosser fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07	2021-07-23 16:40:17 -07:00
Thomas J. Fan	1ec6205bd0	ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 (Interesting how the maxpool tests are currently in `test/test_nn.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984 Reviewed By: suo Differential Revision: D29883846 Pulled By: jbschlosser fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8	2021-07-23 16:14:10 -07:00
Joel Schlosser	f4ffaf0cde	Fix type promotion for cosine_similarity() (#62054 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054 Reviewed By: suo Differential Revision: D29881755 Pulled By: jbschlosser fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6	2021-07-23 15:20:48 -07:00
Joel Schlosser	e408af083f	Improve MHA docs (#61977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60831 Also clarifies the relationship between `embed_dim` and `num_heads` (see https://github.com/pytorch/pytorch/issues/60853 and https://github.com/pytorch/pytorch/issues/60445). Formatting was overhauled to remove some redundancy between the input docs and shape docs; suggestions / comments welcome! Link to rendered docs here: https://14912919-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.MultiheadAttention.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/61977 Reviewed By: bhosmer Differential Revision: D29876884 Pulled By: jbschlosser fbshipit-source-id: a3e82083219cc4f8245c021d309ad9d92bf39196	2021-07-23 15:19:34 -07:00
Hao Lu	cf3cc01f1d	[Static Runtime] Add is_frozen to StaticModule ctor (#62020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62020 Add is_frozen to StaticModule ctor so we can skip freezing in StaticModule. Reviewed By: ajyu, mikeiovine Differential Revision: D29807431 fbshipit-source-id: 7742e9f5c5ae9f442a9e4007c870a14fd8b4af20	2021-07-23 15:12:35 -07:00
Elton Leander Pinto	fa11103c6a	[clang-tidy] Fix unknown GNU flag error (#62128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62128 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29888297 Pulled By: 1ntEgr8 fbshipit-source-id: 0657d5baa72c014a83c9def4a39338c52f4ef8d1	2021-07-23 14:46:51 -07:00
Thomas J. Fan	9730d91abd	MAINT Migrates multilabel_margin_loss from THC to ATen (CUDA) (#60708 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24603 Fixes https://github.com/pytorch/pytorch/issues/24602 <s>The implementation should be exactly the same, so it is strange that the benchmarks show such a significant improvement in this PR.</s> The benchmarks are now the same. <details> <summary>Benchmark script</summary> ```python from itertools import product import torch import torch.nn as nn import torch.nn.functional as F import time torch.manual_seed(0) MS_PER_SECOND = 1000 def _time(): torch.cuda.synchronize() return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 n_runs = 100 reductions = ["none", "sum", "mean"] Ns = [1_000, 10_000, 100_000] for reduction, N in product(reductions, Ns): total_fwd_time = 0 total_back_time = 0 grad_out = torch.randn(N, device=device) if reduction != "none": grad_out = grad_out[0] for _ in range(n_runs): input = torch.randn(N, C, device=device, requires_grad=True) target = torch.randint(0, C, size=input.size(), device=device) # forward start = _time() result = F.multilabel_margin_loss(input, target, reduction=reduction) total_fwd_time += _time() - start result = F.multilabel_margin_loss(input, target, reduction=reduction) for _ in range(n_runs): # backward start = _time() result.backward(grad_out, retain_graph=True) total_back_time += _time() - start fwd_avg = total_fwd_time / n_runs bwd_avg = total_back_time / n_runs print( f"input size({N}, {C}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)" ) ``` </details> ## master ``` input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.41 (ms) input size(10000, 30), reduction: none, fwd: 1.26 (ms), back: 3.58 (ms) input size(100000, 30), reduction: none, fwd: 13.15 (ms), back: 34.68 (ms) input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.38 (ms) input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 3.53 (ms) input size(100000, 30), reduction: sum, fwd: 13.04 (ms), back: 34.53 (ms) input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.38 (ms) input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 3.52 (ms) input size(100000, 30), reduction: mean, fwd: 13.12 (ms), back: 34.54 (ms) ``` ## this PR ``` input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.35 (ms) input size(10000, 30), reduction: none, fwd: 1.22 (ms), back: 2.98 (ms) input size(100000, 30), reduction: none, fwd: 12.90 (ms), back: 29.32 (ms) input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.32 (ms) input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 2.97 (ms) input size(100000, 30), reduction: sum, fwd: 13.00 (ms), back: 29.17 (ms) input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.32 (ms) input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 2.97 (ms) input size(100000, 30), reduction: mean, fwd: 13.09 (ms), back: 28.91 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60708 Reviewed By: saketh-are Differential Revision: D29856579 Pulled By: ngimel fbshipit-source-id: b6bbf27a71e5a04f61779f6fef4ed1c98baa2607	2021-07-23 13:45:28 -07:00
Ilia Cherniavskii	a6c6fd923e	[profiler] Nvtx support (#61634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61634 Legacy profiler supported Nvtx and that was used by emit_nvtx, this PR adds support for Nvtx in the new compiler, to prepare for the eventual deprecation of the legacy profiler Test Plan: Verified that the profiles produced with nvprof are the same ``` import torch import torchvision.models as models from torch.autograd.profiler import emit_nvtx, load_nvprof model = models.resnet18().cuda() inputs = torch.randn(5, 3, 224, 224).cuda() with emit_nvtx(record_shapes=True): model(inputs) ``` /usr/local/cuda/bin/nvprof -o test_trace2.prof -f -- python test_emit_nvtx.py ``` evt = load_nvprof("/home/iliacher/local/pytorch/test_trace.prof") ``` Imported from OSS Reviewed By: kimishpatel, gdankel Differential Revision: D29691316 fbshipit-source-id: 1e186cc072368f3e3987a2da0bfd90ed328817c5	2021-07-23 13:37:09 -07:00
Jamie King	812bc1dde6	Smart Decay for Adam - DPER3 (#62058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058 This is the second diff in this stack. This diff includes the changes to DPER3; the first diff includes the changes to Caffe2. We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam. Differential Revision: D29638897 fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c	2021-07-23 13:26:30 -07:00
Yukio Siraichi	5224490ae9	Implement NumPy-like `frombuffer` tensor constructor. (#59077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077 Fixes #58549 `from_buffer` constructs a tensor object from an already allocated buffer through CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters, this function also accepts: - `device`: where the buffer lives - `requires_grad`: should autograd record operations on the new tensor A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were implemented. That's because neither PyTorch nor Numba implements CPython's buffer protocol. Therefore, there's no way to create a CUDA buffer with the existing dependencies (could use PyCUDA for that, though). At the moment, if `device` differs from the device the buffer actually lives, two things may happen: - `RuntimeError`, if `device='cuda'` - Segmentation fault (not tested -- see above), if `device='cpu'` Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29870914 Pulled By: mruberry fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb	2021-07-23 13:17:48 -07:00
Mike Iovine	ec4e6181e6	[Static Runtime] Fix broken test_static_runtime build (#62098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62098 The build was broken by D29821533 (`1d2ea76afb`). The `clamp` overloads used in `deep_wide.h` are no longer available in the `at::native` namespace. Use `at::cpu::clamp` and `at:🗜️:clip_out` (which should be an alias for clamp) instead. Reviewed By: hlu1 Differential Revision: D29880187 fbshipit-source-id: 210b6d2be8a8142e7af1a0ba07e55a95b1a77d25	2021-07-23 12:35:09 -07:00
Jane Xu	b820493cf1	[skip ci] Refactor CIFlow init logic (#62102 ) Summary: This PR refactors the CIWorkflow post_init step to best account for how CIFlow interacts with everything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62102 Test Plan: This PR did NOT garner any workflow changes. I ran mypy and flake8 on the changed file locally with no issues. Reviewed By: jbschlosser Differential Revision: D29883275 Pulled By: janeyx99 fbshipit-source-id: 6c5c1fc1878159e0de1bf8d9bd0cb32aa47af49a	2021-07-23 12:29:04 -07:00
Yi Wang	71cfbc45b4	Remove redundant `torch.cuda.set_device(self.rank)` (#62097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62097 as title ghstack-source-id: 134196740 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_profiling_autograd_profiler Reviewed By: rohan-varma Differential Revision: D29880040 fbshipit-source-id: 6a06fb2d87e9a7dfa1d7c81bf0c3fe115c1a1abb	2021-07-23 11:59:16 -07:00
Peter Bell	5ef667a8b8	Remove duplicated movedim implementation (#61939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61939 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29850798 Pulled By: zou3519 fbshipit-source-id: e803b235d8535a204515ff9f5d46b8c4d191b73c	2021-07-23 11:52:07 -07:00
Philip Meier	10ccc5a81c	remove `randn?` from `torch.testing` namespace (#61840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61840 Redo of #60859. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29871017 Pulled By: mruberry fbshipit-source-id: 47afed1dc6aa0bb1e826af616ef5d5aaabb8e5bb	2021-07-23 11:51:03 -07:00
Kushashwa Ravi Shrimali	cb47d1f9c8	OpInfo Ref: fmod, remainder (#61527 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 for OpInfo tracker. This PR: * [x] Adds references to both `fmod` and `remainder` for testing. * [x] Updates `remainder` documentation to add a note on divergence with `std::remainder`. (something similar to NumPy's note: https://numpy.org/doc/1.20/reference/generated/numpy.remainder.html), see: https://github.com/pytorch/pytorch/pull/61527#discussion_r670238788 for further discussion. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61527 Reviewed By: albanD Differential Revision: D29841266 Pulled By: mruberry fbshipit-source-id: be99851a94f53ea2fc07b64fd7c947775129658c	2021-07-23 11:44:32 -07:00
Brian Hirsh	c9b71549f2	don't allow alias dispatch keys to go in the DispatchKeySet (#61771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61771 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29736432 Pulled By: bdhirsh fbshipit-source-id: 54bb716db1e41565b00f4f01ea0096f834087577	2021-07-23 11:29:46 -07:00
anjali411	143ef016ee	Throw RuntimeError when numpy() is called on a tensor with conjugate or negative bit set (#61925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61925 Resolves https://github.com/pytorch/pytorch/issues/59945 and https://github.com/pytorch/pytorch/issues/59946 bc breaking note: Unlike before, complex_tensor.conj().numpy(), complex_float_tensor.conj().view(torch.float64), complex_float_tensor.conj().imag.view(torch.int32) now doesn't return a view but instead errors out Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29819288 Pulled By: anjali411 fbshipit-source-id: 4bebec721eb535f44ef4b728bdc75fa444e05d16	2021-07-23 11:28:36 -07:00
kshitij12345	943ca5f6f7	[special] alias for mvlgamma (#61633 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Have added `out` variant for consistency. TODO: * [x] Check docs https://docs-preview.pytorch.org/61633/special.html#torch.special.multigammaln Pull Request resolved: https://github.com/pytorch/pytorch/pull/61633 Reviewed By: albanD Differential Revision: D29815514 Pulled By: mruberry fbshipit-source-id: 003c7b6a5938ecc7a96727310e8a39da0b3d7aca	2021-07-23 11:24:27 -07:00
Aliaksandr Ivanou	0c55f1bdec	[torchelastic] Improve process termination logic (#61602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61602 The diff introduces signal handlers and SignalException that is raised when the agent process receives SIGTERM or SIGINT. When any of these signals received, the termination handler will raise the `SignalException`. The exception will then be processed by the main agent loop. The `shutdown(signum)` will be invoked, that would propagate the received signal to the child processes. The default 30 seconds timeout introduced: if child processes will not be able gracefully terminate during this timeout, the agent process would kill the processes via SIGKILL. Test Plan: unittests, sandcastle Reviewed By: cbalioglu Differential Revision: D29671783 fbshipit-source-id: 3dbca2125676dc18d417cc3e3bb0301fdd42737a	2021-07-23 11:00:15 -07:00
Edward Yang	e42360d56f	Remove default arguments before calling to __torch_dispatch__ (#61123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61123 This applies the design pattern of removing explicit arguments when they coincide with the default arguments. This simplifies argument patterns that dispatch kernels receive and make it easier for us to maintain BC (as addition of a new default argument isn't immediately BC-breaking for dispatch implementors). There is an important extra API which I haven't implemented here yet, which is to take an incomplete sequence of arguments and fill out their defaults (in case the user did want normalization). I plan on adding that in a future PR. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29853616 Pulled By: ezyang fbshipit-source-id: 71c672cb3a7d4d01f838a1c7fcdb75a8ce7d058e	2021-07-23 10:41:35 -07:00
Charles David Hernandez	32d0c3e8ee	Support for reference convert_fx working on gpu Summary: This PR enables gpu only quantization, best used with is_reference since there are not many gpu kernels for ops as of now. This PR mainly changes how qconfigs and their obs constructors operate once they on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original qconfig, and configures them so that when invoked, the created obs will be on whatever device the module occupies. (Once observers are created, module.to(device) is already setup so that it moves any observers). To do this, a new method and a few small chanegs were added to the _PartialWrapper class that our observers already use to create constructors (without changing the existing functionality). These changes work in concert with changes to the prepare flow such that when the qconfigs are propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr. Ideally this would work on other models but the is_reference support for a lot of modules isn't there yet, those tests should be added in a future PR Test Plan: python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence Reviewed By: vkuzo Differential Revision: D29684114 fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7	2021-07-23 10:30:38 -07:00
Peter Bell	0df1679e5c	BatchNorm: fix mixed precision usage with affine=False (#61962 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61924 The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962 Reviewed By: albanD Differential Revision: D29825516 Pulled By: ngimel fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda	2021-07-23 09:55:52 -07:00
Jane Xu	e318058ffe	Ignore LNK4099 for debug binary libtorch builds (#62060 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060 Test Plan: This CI shouldn't break and https://github.com/pytorch/pytorch/pull/62061 Reviewed By: driazati Differential Revision: D29877487 Pulled By: janeyx99 fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77	2021-07-23 09:31:41 -07:00
Vasiliy Kuznetsov	04c95a0638	ns for fx: expose hook to define custom weight extraction functions (#62047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62047 Adds a hook for user to define a weight extraction function for a custom type. Example usage: ``` op_to_type_to_weight_extraction_fn = \ get_op_to_type_to_weight_extraction_fn() op_to_type_to_weight_extraction_fn['call_function'][_wrapped_linear] = \ torch.quantization.ns.weight_utils.get_linear_fun_weight results = extract_weights_impl( 'a', m1, 'b', m2, op_to_type_to_weight_extraction_fn=op_to_type_to_weight_extraction_fn) ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853625 fbshipit-source-id: 183916ef54ba303bc818e0eba00b52e33c4633ad	2021-07-23 09:31:37 -07:00
Vasiliy Kuznetsov	07c6a12008	ns for fx: fix typing issue in weight extraction (#62041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041 Before this PR, weights of conv and linear modules were extracted as lists, in order to match the signature of LSTM weights. After this PR, weight extraction preserves the type of the weights, so extracted weights of conv and linear have a different type from LSTM weights. The comparison util functions are updated to handle the LSTM weight type of `List[tensor]`. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853626 fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e	2021-07-23 09:31:33 -07:00
Vasiliy Kuznetsov	eaba16d665	ns for fx: change weight extraction to direct mapping (#62038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038 Updates the logic to extract weights from nodes to use a direct mapping from type to weight extraction function. This is needed for a future PR which will allow users to specify custom weight extraction functions for user defined types. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853627 fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca	2021-07-23 09:30:08 -07:00
Richard Barnes	8a2c525d3b	Fix some sign comparisons (#61849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61849 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29736180 fbshipit-source-id: 1391b11e73725ee985b9aa768566ca77f44d04ae	2021-07-23 09:03:33 -07:00
Jane Xu	9d4056468e	Migrate scheduled jobs debuggability to GHA (#62056 ) Summary: This removes the debuggable-ci workflow in Circle and enables the same idea in GHA, to allow contributors to run scheduled GHA workflows by: 1. assigning the PR to pytorchbot. 2. labeling the PR with ciflow/scheduled 3. unassigning the PR. This PR also adds the trigger_action_only logic to windows_ci_template yaml, as it was present on the linux template and seemed to be left out by mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62056 Test Plan: Note that this periodic job https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138504471 ran later than other jobs (like [this one](https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138226668)), and its time is close to when unassigning happens. Reviewed By: seemethere Differential Revision: D29859079 Pulled By: janeyx99 fbshipit-source-id: cd5c6be415cfa8090e3cac90625f92b49fd453a8	2021-07-23 08:48:22 -07:00
Yi Wang	b03b45afd9	[DDP Comm Hook] Use a single tensor instead of a tensor list as the comm hook result (#62074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62074 Since SPMD mode is retired, the comm hook result will always be a single tensor. This can improve comm hook developer experience, as no need to add an extra `[0]` to the precursor future result. #Closes: https://github.com/pytorch/pytorch/issues/61914 ghstack-source-id: 134164593 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D29864732 fbshipit-source-id: 59fe6dd78b66214b1788514ad4d236039d9bda31	2021-07-23 03:32:05 -07:00
Meghan Lele	1d2ea76afb	`clamp`: port to structured kernel (#61361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361 This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag. For more information, see #55070. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29821533 Pulled By: SplitInfinity fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99	2021-07-23 02:02:07 -07:00
Basil Hosmer	b106b958eb	preserve residual in transformer norm_first (#61692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61692 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29706830 Pulled By: bhosmer fbshipit-source-id: d9c9e88fb589d46189955a96909c6ca76d587f72	2021-07-22 23:49:08 -07:00
Yi Wang	53222c59f0	Reformat (#62073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62073 as title ghstack-source-id: 134159445 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29869185 fbshipit-source-id: 17a32d56860e9469bd26c4eb4ca2d483827d946e	2021-07-22 23:36:22 -07:00
Karen Zhou	3687bbb1ed	[pruner] add Conv2d support (#61778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61778 Adding Conv2d as supported modules for the pruner. Previously the pruner only supported Linear layers. This addition includes: - adding a Conv2d activation reconstruction forward hook to match Conv2d weight shapes - in `prepare`, checking the type of the module and using the corresponding activation forward hook ghstack-source-id: 134143557 Test Plan: Added conv2d tests `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LLf3 Reviewed By: jerryzh168 Differential Revision: D29719045 fbshipit-source-id: 6a9f91b96992c552fff32f0e5a6e22f16eb7077b	2021-07-22 23:00:31 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	260198d42c	Disable bazel in CircleCI (#62055 ) Summary: As it runs in GHA for a while Pull Request resolved: https://github.com/pytorch/pytorch/pull/62055 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29856620 Pulled By: malfet fbshipit-source-id: 754e392442f68d4eee15811e2bd2cf147326c42a	2021-07-22 16:28:12 -07:00
Richard Barnes	a91be24e2d	Modernize make pointers (#61741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717385 fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e	2021-07-22 15:54:37 -07:00
Jane Xu	f98fa5ea13	[skip ci] minor typo link fix (#62042 ) Summary: This is not a functional change but a typo fix where I forgot to update the link to windows_smoke_tests.csv in test_python_first_shard. The windows_smoke_tests.csv is currently the same in pytorch/test-infra and my fork, janeyx99/test-infra, but that will not be the case in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62042 Reviewed By: seemethere Differential Revision: D29851984 Pulled By: janeyx99 fbshipit-source-id: 9bafdf0ba006b9128463e3cf132fdfcddd3d10f2	2021-07-22 15:34:41 -07:00
Eli Uriegas	1a64a5c0ba	.github: Only run workflows on pytorch/pytorch (#62044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62044 Downstream users have reported that they're seeing github workflows pop up in their downstream forks which is not ideal. Let's make it so that all of these generated workflows actually get skipped. Also includes workflows related to automating pytorch/pytorch repository maintenance Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D29852199 Pulled By: seemethere fbshipit-source-id: bbc1684c06a50bb3597f3112cb65fe9c1a4d7c1f	2021-07-22 15:08:31 -07:00
Thomas J. Fan	414537ac99	DOC Fixes link in register_module_backward_hook (#61999 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61999 Reviewed By: saketh-are Differential Revision: D29847397 Pulled By: albanD fbshipit-source-id: 3d9e1a5abac82d658b4f1746ace73e2fecb41725	2021-07-22 14:29:40 -07:00
Alban Desmaison	b522f3be4c	Svd docfix (#62028 ) Summary: moving back the variable names to match the python variable and remove unicode exponents. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62028 Reviewed By: saketh-are, mruberry Differential Revision: D29848591 Pulled By: albanD fbshipit-source-id: f86b8666cb5f86e300e214a6d59638d069018c50	2021-07-22 14:11:52 -07:00
Jane Xu	d6e776d961	Add build/.ninja_log to artifacts for Windows (#62035 ) Summary: Being able to download the .ninja_log allows for better debugging. There may be a follow-up PR to convert this to a better tracefile. This PR only handles windows as it is already handled for linux here: https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L248-L252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62035 Test Plan: Check the artifacts for a windows job and see if we see .ninja_log Reviewed By: malfet Differential Revision: D29852228 Pulled By: janeyx99 fbshipit-source-id: a3a87b709cd0c84f5b3cdc274ac4a623771c2b5c	2021-07-22 13:04:29 -07:00
Thomas J. Fan	0309c5780d	ENH Adds no batch dim support for AvgPool1d (#61860 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61860 Reviewed By: albanD Differential Revision: D29826382 Pulled By: jbschlosser fbshipit-source-id: 47e12073d866f0604310fc1ff270cde9907e516d	2021-07-22 12:46:48 -07:00
Kurt Mohler	5a00152a3d	Warn about poor performance creating Tensor from list of numpy.array's (#51680 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/13918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51680 Reviewed By: saketh-are Differential Revision: D29847229 Pulled By: ezyang fbshipit-source-id: 0519aad27f9ca1d8c06be5b9e6de382374d8b72b	2021-07-22 12:02:50 -07:00
Mike Iovine	2b0eddb0aa	[Static Runtime] Implement prim::isinstance and prim::TypeCheck (#61783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61783 Implement two new prim operators for static runtime: `isinstance` and `TypeCheck`. `isinstance` is very straightforward, but there were a few wrinkles with implementing `TypeCheck`: 1. There is no way to directly generate `TypeCheck` nodes from TorchScript, they are generated by the JIT at runtime. This makes testing a little difficult. I had to make some modifications to `testStaticRuntime` to allow for the use of IR and TorchScript tests. 2. The behavior of `prim::TypeCheck` as implemented here does not match up 1:1 with the version implemented in the interpreter! This is because grad mode is disabled in static runtime. Here's an example. IR is the same as the one included in this test, but with `requires_grad == 1` ``` graph(%a.1 : Tensor, %b.1 : Tensor): %t0 : Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), %t1 : Float(3, 3, strides=[3, 1]), %type_matched : bool = prim::TypeCheck[types=[Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), Float(3, 3, strides=[3, 1])]](%a.1, %b.1) return (%t0, %t1, %type_matched) ``` And in the test setup: ``` auto a = at::zeros({2, 2}, at::kFloat); a.to(at::kCPU); a.set_requires_grad(true); auto b = at::ones({3, 3}, at::kFloat); std::vector<IValue> args_correct = {a, b}; // prim::TypeCheck should be true with args_correct, // but we get false when using static runtime! ``` Reviewed By: hlu1 Differential Revision: D29743862 fbshipit-source-id: db1788f0f5de42bab42602e8cc24eee04cbcc280	2021-07-22 10:23:35 -07:00
Zhiyuan Chen	e6339ee336	optimize imports (#61908 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61908 Reviewed By: suo Differential Revision: D29800269 Pulled By: ejguan fbshipit-source-id: 74ce4414eb6d2a5608df9ec1efdc71e2112aef70	2021-07-22 09:58:44 -07:00
Jane Xu	554e04090f	Add 11.3 conda nightly binaries (#61873 ) Summary: Adds conda 11.3 cuda binaries to our nightly matrix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61873 Test Plan: Tested by https://github.com/pytorch/pytorch/pull/61867-->testing complete, showing all passing binaries. THIS CAN ONLY BE MERGED _AFTER_ pytorch/builder#806 and pytorch/builder#807 are merged, which they now are. Reviewed By: saketh-are Differential Revision: D29848267 Pulled By: janeyx99 fbshipit-source-id: db04899418bd0b4116315fbbe36b06f772020c2e	2021-07-22 09:50:13 -07:00
Gautier Minster	e858f6eed9	torch.nn.utils.clip_grad_norm_: remove device syncs (#61042 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60691 ### Changes Per the discussion in the above issue, this PR makes 2 changes: 1. When `error_if_nonfinite=False`, the NaN/Inf checks are truly skipped, and no device synchronization occurs. - Additionally, when performing the checks, the 2 results are combined with `torch.logical_or` to incur only a single sync (instead of 2 in the happy/finite path). 2. The `clip_coef` conditional is removed, in favor of a call to `clamp(..., max=1.0)` and an unconditional multiplication. ### Testing - The existing unit tests for `clip_grad_norm_` pass. - I have manually profiled the example program from https://github.com/pytorch/pytorch/issues/60691, and verified that: - No synchronizations occur when using `error_if_nonfinite=False`. - A single synchronization occurs when using `error_if_nonfinite=True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61042 Reviewed By: mrshenli Differential Revision: D29764096 Pulled By: jbschlosser fbshipit-source-id: db594b24608d16374b91bcbb9469046dfeeb152d	2021-07-22 08:53:40 -07:00
imaginary-person	9e53c823b8	Add AVX512 support in ATen & remove AVX support (#61903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903 ### Remaining Tasks - [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP). ### Summary 1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE` also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed. 2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415). It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now. 3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now. 4. One test is currently being skipped - [test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines. The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d. Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses. Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code. Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests. ### Testing 1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2. Only one test had to be modified, as it was hardcoded for AVX2. 2. `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support. ### Would the downclocking caused by AVX512 pose an issue? I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance. This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance. Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) - ![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG) ![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG) The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them. ### Is PyTorch always faster with AVX512? No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512. It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed. Original pull request: https://github.com/pytorch/pytorch/pull/56992 Reviewed By: soulitzer Differential Revision: D29266289 Pulled By: ezyang fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184	2021-07-22 08:51:49 -07:00
cyy	59d6e07ada	fix forward_idx check (#59911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59911 Reviewed By: dzhulgakov Differential Revision: D29829020 Pulled By: albanD fbshipit-source-id: f685063061dab499368a272d6b94a44e89f9a143	2021-07-22 08:37:33 -07:00
Vitaly Fedyunin	b60d1b713e	Revert D26007050: add channels last support for thnn_conv2d (non-dilated) Test Plan: revert-hammer Differential Revision: D26007050 (`8b88c24670`) Original commit changeset: 1289e0687c24 fbshipit-source-id: 88b679efbcae572fe604d50e2199861cadbc3d4a	2021-07-22 08:31:15 -07:00
Vitaly Fedyunin	171598f0e3	[Refactoring] Fix imports order for torch/utils/data/dataset.py (#61328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61328 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588897 Pulled By: VitalyFedyunin fbshipit-source-id: 63df653fb471532819c83ebcee4f9dc951500ffb	2021-07-22 08:30:08 -07:00
jiayisun	1b02641bb1	add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: ejguan Differential Revision: D29800899 Pulled By: ezyang fbshipit-source-id: 26d2c2ac3e7d3a2d49679508aad8c8bf0232cad5	2021-07-22 08:13:22 -07:00
Edward Yang	f3f7e92be5	Manually call lazyInitCUDA in structured CUDA calls (#61882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61882 If you directly call the native implementation that bypasses the initialization, which is bad! This probably slows things down a little though... Fixes problem uncovered by #61642 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D29783856 Pulled By: ezyang fbshipit-source-id: 16857569a049e09c6ebd96ef04b0025403b254af	2021-07-22 07:50:05 -07:00
Vitaly Fedyunin	196679d3aa	[Refactoring] Reordering imports in torch/utils/data/datapipes/iter/__init__.py (#61325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61325 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588896 Pulled By: VitalyFedyunin fbshipit-source-id: 8c0f3580f82083c43a590a18ecddb3e04ae93ca9	2021-07-22 07:46:08 -07:00
Alban Desmaison	25be031c6e	Add missing docker build to slow gradcheck label-triggered build (#61941 ) Summary: Currently, when adding the label, it fails like: https://app.circleci.com/pipelines/github/pytorch/pytorch/352569/workflows/d213cbad-edd6-4fe0-a79c-d46f8c0aae85/jobs/14856158 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61941 Reviewed By: suo Differential Revision: D29827084 Pulled By: albanD fbshipit-source-id: 134828d36e51324e6b6539dd4bc5f1eebfb89a03	2021-07-22 07:37:21 -07:00
Andrew Gu	5186fa2831	Fix `c10d` -> `dist` in `test_ddp_hooks.py` (#61864 ) Summary: Overview: The existing `test_ddp_hooks.py` test file uses a prefix `c10d`, which is not defined in the file, meaning the test errors if left as is. This renames each `c10d` prefix to `dist`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61864 Test Plan: All four tests pass when run: ``` gpurun python test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py ``` Reviewed By: ejguan Differential Revision: D29783860 Pulled By: andwgu fbshipit-source-id: 16bdd2dfcb76192964246148f14851a74f8907c8	2021-07-22 07:20:41 -07:00
Freey0	109bd5e78a	OpInfo: bitwise_and (#61349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61349 Also add type promotion test for bugs found by pr #60813 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29592840 Pulled By: ezyang fbshipit-source-id: ee013b20e31baf6c6ebf2edb881ae6d8e215c7a6	2021-07-22 07:04:17 -07:00
lezcano	2f3300f25f	[docs] Correct torch.permute (#61833 ) Summary: Noted while reviewing https://github.com/pytorch/pytorch/issues/61830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61833 Reviewed By: albanD Differential Revision: D29816661 Pulled By: mruberry fbshipit-source-id: 895607d7ddcbd4319218ab7719a2f57cbde2283c	2021-07-22 00:27:23 -07:00
Kushashwa Ravi Shrimali	5801431c9b	OpInfo Ref: addbmm (#61832 ) Summary: See https://github.com/pytorch/pytorch/issues/54261. This PR: * Adds reference wrapper using NumPy for reference function of `addbmm` * Refines sample inputs (makes it more readable and avoids redundancy) cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61832 Reviewed By: albanD Differential Revision: D29816024 Pulled By: mruberry fbshipit-source-id: e0fea6dc923504169a13bfaa258c61fbbc5fa9f4	2021-07-22 00:26:10 -07:00
Jiewen Tan	31beef009d	Fix IMethodTest.GetArgumentNames after D29648756 (#61985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61985 Fix IMethodTest.GetArgumentNames after D29648756 (`641f6ef8a7`). ghstack-source-id: 134054637 Test Plan: buck test mode/dev caffe2/test/cpp/api:imethod -- IMethodTest.GetArgumentNames Reviewed By: suo Differential Revision: D29828807 fbshipit-source-id: b1411745b91e1b8c0ea0fd9e9666e22125dde333	2021-07-22 00:21:59 -07:00
Zeina Migeed	07a91f1cfd	fix graph deepcopy to propagate output type (#61747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61747 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29737565 Pulled By: migeed-z fbshipit-source-id: 8583f0c87f2db27695e062f59a15de77f3b00fd6	2021-07-21 23:53:03 -07:00
Masaki Kozuki	8a2063e58a	Foreach Test Refactor: Pointwise, Min/Max-imum (#61327 ) Summary: - rewrite pointwise unittests using `ops` decorator - rewrite minimum&maximum unittests using `ops` decorator - enable minimum/maximum fastpath for BFloat16 - remove _test_data method https://github.com/pytorch/pytorch/issues/58833 cc: ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/61327 Reviewed By: albanD Differential Revision: D29830209 Pulled By: ngimel fbshipit-source-id: fa7805262b86c40fc32750b16629d80ad48ea4b5	2021-07-21 21:59:57 -07:00
Vitaly Fedyunin	d6899fe492	[Refactoring] Reordering imports in utils/data/__init__.py (#61324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61324 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588895 Pulled By: VitalyFedyunin fbshipit-source-id: 5e719c80f9cb5630c65187ac89773831777f368d	2021-07-21 21:38:28 -07:00
Eli Uriegas	06efced177	.github: Specify directory to pull reports from (#61990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61990 This adds more specificity to where to pull test reports from since I believe that actions/upload-artifact doesn't actually respect the working-directory default Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, zhouzhuojie Differential Revision: D29831719 Pulled By: seemethere fbshipit-source-id: cee5609f97338d44a484d85baa77f0167d81ce55	2021-07-21 20:57:07 -07:00
Shiyan Deng	cc18654d66	[fx_acc] Refactoring acc_tracer (#61963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61963 Test Plan: CI Reviewed By: jfix71 Differential Revision: D29772522 fbshipit-source-id: 4b117735147624f9428b933ea798495823423a0e	2021-07-21 20:09:15 -07:00
Natalia Gimelshein	6284d2a82b	wrap cudaStreamSynchronize calls (#61889 ) Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a	2021-07-21 19:30:52 -07:00
Xue Haotian	3d6aa3a2f6	Enable `torch.isclose` to suppport bool tensors (#61271 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61271 Reviewed By: zhxchen17 Differential Revision: D29737618 Pulled By: SplitInfinity fbshipit-source-id: 45314bc7e0b9a28c10700455b1e6267c0db3eefc	2021-07-21 18:50:14 -07:00
Zeina Migeed	243c7079a1	add 3d input and output shapes to maxpool documentation (#61310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61310 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29737516 Pulled By: migeed-z fbshipit-source-id: eb6964f6808b8ae05d4d3852a5162dc66930cd64	2021-07-21 18:27:27 -07:00
Pyre Bot Jr	d00bb45846	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D29827809 fbshipit-source-id: 7ca7c2a33d691ac57392945b78a320d253c84ed4	2021-07-21 17:56:26 -07:00
driazati	a0e381641b	Remove relative paths for clang-tidy annotations (#62004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62004 Some of the files checked by clang tidy are compiled from a sibling directory, so the files all start with something like `../torch`. This ends up messing with `translate_annotations.py` which runs from the repo root. This fixes it by chopping off any relative paths in the clang tidy output. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29835446 Pulled By: driazati fbshipit-source-id: 2bd279370e41ed0a321e30f88fe38434105c75e8	2021-07-21 17:52:31 -07:00
Harut Movsisyan	e731a63e63	Silence clang-tidy linter for TorchpyTest.FxModule test (#62001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62001 This will fix [this linter error](https://github.com/pytorch/pytorch/runs/3120335141) introduced with D29690088 (`810e19979d`). Test Plan: N/A (just looked at other examples and tidy doc https://clang.llvm.org/extra/clang-tidy/) Reviewed By: suo Differential Revision: D29832654 fbshipit-source-id: 8cf69cb5551f3b1bd384a2553dc5c827beb0a68f	2021-07-21 17:40:46 -07:00
zhouzhuojie	b6ff0fa8dd	Enable dynamically ciflow/slow so that we can run GHA slow tests on PR (#61987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61987 This PR enables us to run slow GHA tests on PR. Steps to do (~may only take effect after this PR is merged~ works on this PR) - Add label `ciflow/slow` - Assign/unassign pytorchbot - The job should be running .github/workflows/pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7.yml The above steps are manual, and after probot can do the dispatch work, the ciflow will be automated. Related meta RFC issue: #61888 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29832758 Pulled By: zhouzhuojie fbshipit-source-id: 64d31ef572502e62b80e6b7ac480ffcfa9f4e38b	2021-07-21 16:56:54 -07:00
Nikita Shulga	9d6cdf34a4	Annotate generated files in .gitattributes (#61995 ) Summary: Mark CI yaml files generated from templates as linguist-generated Fixes https://github.com/pytorch/pytorch/issues/61994 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61995 Reviewed By: seemethere Differential Revision: D29832199 Pulled By: malfet fbshipit-source-id: 86ad3a16b4d3e4f94c35b8f766a8556a07632419	2021-07-21 16:49:07 -07:00
Raghavan Raman	ae58a4c45d	[Static Runtime] Added a variadic cat operator (#61302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29565344 Pulled By: navahgar fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248	2021-07-21 15:58:20 -07:00
Richard Barnes	b145889192	Modernize use make_unique (#61739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61739 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717133 fbshipit-source-id: 70e3d81a48f7ae90cca3ef3c9587174ca15d81f4	2021-07-21 15:28:26 -07:00
Mengwei Liu	2c0ecfbb20	[PyTorch] Expose bias() and unpack() API of LinearPackedParamsBase to Python layer (#61855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61855 Exposing `bias()` and `unpack()` for `LinearPackedParamsBase`. This is useful for inspecting linear op attributes. Test Plan: See unit test passing: ``` [ (6c61a5eb4) \| devvm1625 ~/fbsource/fbcode] buck test //caffe2/test:quantization -- test_linear_bias_unpack Parsing buck files: finished in 2.8 sec Building: finished in 9.9 sec (100%) 11973/55220 jobs, 0/55220 updated Total time: 12.8 sec More details at https://www.internalfb.com/intern/buck/build/2d0ee210-c8f3-4994-ac2b-1dccf4c3ca6c Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: b7c6ea1b-8eef-430e-b83a-dad4033ecc87 Trace available for this run at /tmp/tpx-20210720-115423.031745/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562 ✓ ListingSuccess: caffe2/test:quantization - main (10.806) ✓ Pass: caffe2/test:quantization - test_linear_bias_unpack (quantization.core.test_quantized_op.TestQuantizedOps) (10.913) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562 ``` Reviewed By: kimishpatel Differential Revision: D29767704 fbshipit-source-id: 716f43b61814b92094c0b08d4e63e1dddc352aa7	2021-07-21 15:13:40 -07:00
BowenBao	a02ccd6080	[ONNX] add supplement for standardOps low precision cast (#60731 ) (#61561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61561 address Gary reply and add supplement of https://github.com/pytorch/pytorch/pull/53813. - add more details for LowPrecisionCastNodeForStandardOps to make it more comprehensible. - remove unuse gemm test Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767991 Pulled By: SplitInfinity fbshipit-source-id: d00032e13699f5b02fc619e64aa8fdd39f3a66b8 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-07-21 15:10:36 -07:00
BowenBao	6f08ddfc28	[ONNX] Enable aten:normal op and add tests for aten:uniform op. (#60441 ) (#61560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61560 1. Add a new symbolic function broadcast_tensors() to support exporting torch.broadcast_tensors() function. This is required by exporting torch.distribution.normal() function. 2. Add a new symbolic function normal() to support exporting torch.distribution.normal() function. 3. Add relative tests for normal and uniform ops as well. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767995 Pulled By: SplitInfinity fbshipit-source-id: acfe5e7801d00c0df8ca46966bbd6015fed0045e Co-authored-by: Jay Zhang <jiz@microsoft.com>	2021-07-21 15:10:35 -07:00
BowenBao	f0054e1a6e	[ONNX] Update expand_as for dynamic shape (#61084 ) (#61559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61559 Update expand_as for dynamic shape Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767990 Pulled By: SplitInfinity fbshipit-source-id: 3f1e3f68fd17c5ffbd4a50fccff224fd9d6c84fb Co-authored-by: Negin Raoof <neginmr@utexas.edu>	2021-07-21 15:10:33 -07:00
BowenBao	34075e2c8b	[ONNX] Fix the issue of converting empty list to sequence. (#58651 ) (#61558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61558 When we construct an empty list by python list comprehension, we need to avoid converting the node without inputs to onnx::Concat in shape_type_inference.cpp and peephole.cpp because it will create an invalid Concat node which doesn't have inputs. In addition, update the code to avoid passing a Sequence input to an onnx::Cast node which doesn't accept Sequence data type as an input. Add tests for the validation as well. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767989 Pulled By: SplitInfinity fbshipit-source-id: f97f172ff20eebda4c3744c7a934df36716f12a2 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-07-21 15:10:31 -07:00
BowenBao	22e60d77e7	[ONNX] Support tensor list as module attribute (#59685 ) (#61557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61557 * Support tensor list as module attribute. * Support exporting `torch.set_`. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767992 Pulled By: SplitInfinity fbshipit-source-id: 5ac5a09600d4dbe86b2fe354d240e46f1d1084ef	2021-07-21 15:08:35 -07:00
Pritam Damania	a8f6b5a80a	[1/N] Avoid skipping tests in sandcastle. (#61876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61876 In the sandcastle environment, avoid skipping tests and instead just "pass" these tests to avoid a large number of tasks being created which are not actionable. ghstack-source-id: 133846232 Test Plan: Test with `SANDCASTLE=1 TW_JOB_USER=sandcastle` Reviewed By: rohan-varma Differential Revision: D29779699 fbshipit-source-id: add71008830dfa6f456ce2365a2d70436b7b7a31	2021-07-21 14:31:17 -07:00
Laurence Rouesnel	adb73d3dcf	Removed overhead from reshape() call if tensor doesn't need to be changed (#61466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466 ## Goal Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`). The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has. ### Proposed Implementation Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster. Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`. ### Why not `as_strided`? Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function). This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`. ## Benchmarks In a micro-benchmark for `backward` running: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop // `reshape(-1)` replaced with a call to view(-1) for view baseline x.pow(4).reshape(-1).mean().backward(); ``` I also benchmarked simple operations without gradients using: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop x.reshape(-1) // replaced with a call to view(-1) for view baseline ``` Baselined to `view`: * Original `reshape`: `+3.3%` (without gradients `+20.8%`) * Using `as_strided`: `+55.1%` (without gradients `+1.0%`) * Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`) In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline): * Original `view`: `53.66 us` (without gradients `582.78 ns`) * Original `reshape`: `55.46 us` (without gradients `704.24 ns`) * Using `as_strided`: `83.24 us` (without gradients `576.49 ns`) * Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`) Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time. ### Original performance <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.66 us IQR: 2.70 us (52.54 to 55.24) 884 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 55.46 us IQR: 2.61 us (54.39 to 57.01) 889 measurements, 100 runs per measurement, 1 thread] 2276116 2286256 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 582.78 ns IQR: 33.80 ns (573.80 to 607.61) 833 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 704.24 ns IQR: 24.42 ns (697.20 to 721.62) 679 measurements, 10000 runs per measurement, 1 thread] 56896 67036 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` </details> ### Using `as_strided` <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.37 us IQR: 3.15 us (51.73 to 54.88) 936 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 83.24 us IQR: 4.05 us (81.20 to 85.25) 609 measurements, 100 runs per measurement, 1 thread] 2267916 2525061 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50> 31930 ???:_int_free 15940 ???:malloc 11595 ???:_int_malloc 10100 ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 9360 ???:__tls_get_addr 8280 ???:free 8100 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 4520 ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() 4080 ???:operator new(unsigned long) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2560 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 257145 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 570.55 ns IQR: 32.69 ns (552.87 to 585.56) 874 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 576.49 ns IQR: 37.95 ns (559.51 to 597.46) 861 measurements, 10000 runs per measurement, 1 thread] 56896 58556 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60> 2140 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1940 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1880 ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1720 ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1400 ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 1660 ``` </details> ### Using custom function (`_reshape_alias`) <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.50 us IQR: 2.64 us (52.32 to 54.96) 906 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.13 us IQR: 3.40 us (51.72 to 55.13) 914 measurements, 100 runs per measurement, 1 thread] 2269736 2273236 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10> 5060 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1220 ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 3500 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 505.10 ns IQR: 20.04 ns (500.41 to 520.45) 944 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 536.01 ns IQR: 17.81 ns (531.34 to 549.16) 916 measurements, 10000 runs per measurement, 1 thread] 56896 60376 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10> 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1860 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 3480 ``` </details> Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29792126 Pulled By: laurencer fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd	2021-07-21 14:05:35 -07:00
Richard Barnes	a8d99a28d7	Modernize avoid a C array (#61740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61740 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717118 fbshipit-source-id: 70e73346b75deb4fe6b6399e06bd576f3b6e2b91	2021-07-21 13:52:54 -07:00
zhouzhuojie	d7b31fe95d	Add ciflow config and change jinja2 templates (#61886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61886 This PR is rolling out at the `1. Manual Phase`. ``` # Rollout Strategy: # 1. Manual Phase # step 1. Add 'ciflow/default' label to the PR # step 2. Once there's an [unassigned] event from PR, it should rerun # step 3. Remove 'ciflow/default' label # step 4. Trigger the [unassigned] event again, it should not rerun # 2. Probot Phase 1 (manual on 1 workflow) # step 1. Probot automatically add labels based on the context # step 2. Manually let probot trigger [unassigned] event # 4. Probot Phase 3 (auto on 1 workflows) # step 1. Modify the workflows so that they only listen on [unassigned] events # step 2. Probot automatically adds labels automatically based on the context # step 3. Probot automatically triggers [unassigned] event # 4. Probot Phase 3 (auto on many workflows) # step 1. Enable it for all workflows ``` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D29808366 Pulled By: zhouzhuojie fbshipit-source-id: c7e5009d839239df58825dec093ff0f1fd281697	2021-07-21 13:32:09 -07:00
zhouzhuojie	2dab368d26	Refactor generate_ci_workflows (#61879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61879 Refactor generate_ci_workflows to support CI dispatcher. This is the first step to refactor the workflow into a dataclass with some validation and OOP. Verified that the output is the same: ``` .github/scripts/generate_ci_workflows.py git status ``` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D29808365 Pulled By: zhouzhuojie fbshipit-source-id: b8c5fd43f4bd6e17e06f3925a1a509084b790d95	2021-07-21 13:30:36 -07:00
Jane Xu	e2acce373f	Run Windows smoke tests with gflags in test dir (#61967 ) Summary: Previous testing yielded the torch.version ModuleNotFound error when I ran the smoke tests from the pytorch root directory. This PR simply reorders the commands to run the smoke tests within the test directory, which passes in this series of runs: https://github.com/seemethere/test-repo/actions/runs/1050734298 (the failures are due to missing credentials during uploading stats, which we don't need here) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61967 Reviewed By: samestep Differential Revision: D29820985 Pulled By: janeyx99 fbshipit-source-id: 363ef321c32cfaf4446ceeb6117ea26abc311816	2021-07-21 12:06:34 -07:00
Amy He	a03466cb07	Back out "Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test" (#61878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61878 CMakeLists.txt Android NNAPI delegate library was moved from test/cpp/jit/CMakeLists.txt to torch/CMakeLists.txt. This resolves the issue the original PR had, where the NNAPI delegate library was added to builds without Python (when it depends on Python). Original PR: https://github.com/pytorch/pytorch/pull/61594 There's an error where the library cannot be built on MacOS. This problem existed in the original PR as well, but now an issue has been created: https://github.com/pytorch/pytorch/issues/61930 test_backend_nnapi.py Also changed the skip unit test headers so that it's a little cleaner. Now the unit tests are skipped if the Nnapi delegate library file is not found. Previously, the skip was based on the platform (only allowing Linux). Test Plan: To run NNAPI delegate unit tests: `python test/test_jit.py TestNnapiBackend` Imported from OSS Reviewed By: iseeyuan Differential Revision: D29799895 fbshipit-source-id: b69a767b5cde3814b0853cfbc84d61ab4155f619	2021-07-21 11:58:45 -07:00
driazati	4532b3c4a9	Fix _C public bindings test (#61088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61088 The test was previously a no-op since it was comparing the bindings with themselves. This fixes that to use the hardcoded list and adds the items that changed in the meantime. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29510525 Pulled By: driazati fbshipit-source-id: 3497023e5c8b3cd6fdd1d07d48b4f2650b203ded	2021-07-21 11:50:37 -07:00
Bradley Davis	8880f3d450	[fx] introduce `__fx_create_arg__` dunder method for controlling custom classes are handled as node args (#61780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61780 These changes would allow objects to control how they are handled when they are an argument to a torch.fx call_module node from within their source. Previously, we have been using a custom Tracer with an overridden create_arg() method and branching based on class name to handle args that are unusual (data classes, etc). Reviewed By: suo, houseroad Differential Revision: D27976120 fbshipit-source-id: 0c5249c5f8398368ca0fbec0ad8a07ccf99b7da4	2021-07-21 11:27:09 -07:00
Eli Uriegas	3c7bfa632a	reland D29801875: .github: Clone pytorch to separate directory (#61972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61972 This reverts commit 716567504c8b4da8d764d9674595c2095b62080c. Also includes change to add the TEST_CONFIG env variable so that test reports get uploaded correctly. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29821858 Pulled By: seemethere fbshipit-source-id: 23602706446e0a95db6bd7cedfa665e8c4145168	2021-07-21 11:15:52 -07:00
Harut Movsisyan	810e19979d	Torch deploy for fx.grapm_module with non-torch dependencie (#61680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61680 This diff enables torch deploy for fx.graph_module with non-torch dependencies . Here are the issues currently preventing this and are fixed in this change: - Pickle is used as an internal format to transmit objects between interpreters. It needs to serialize python code, but to be able to get the source code for imports from python_code.globals it needs access to the PackageImporter. Currently a regular _reduce_ function is used which doesn't have the notion of custom importer. - When deserializing pickled objects on an interpreter, it is passing empty globals to exec, thus it will not be able to resolve non-torch imports located in the package. We need to be able to point exec to our custom PackageImporter. - Subclasses extending fx.graph_module should be able to optionally provide their own Tracer (extending fx.Tracer). As a solution a new reducer is introduced (_reduce_deploy_) for torch deploy workflow. Reducer will be registered in _deploy.py (entry point for C++ torch deploy API) when saving the object transmitting it between interpreters. It allows us to pass a proper PackageImporter for each interpreter for pickling/unpickling fx.graph_module. It also defines an api for passing custom fx.Tracer when needed. Test Plan: Added UT to cover changes. ``` buck test //caffe2/torch/csrc/deploy:test_deploy ``` ``` buck test caffe2/test:fx ``` Reviewed By: suo Differential Revision: D29690088 fbshipit-source-id: 3a8dbe02d5d7e085534aa61b7773c86f0f8c19b0	2021-07-21 10:29:48 -07:00
Paul Johnson	f41d3341b1	[pytorch] Support embedding_bag_4bit_rowwise_offsets in cuda (#61728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61728 Templatize existing embedding_bag_byte_rowwise_offsets_kernel to support both 4 bits per dimension and 8 bits per dimension. Test rigorously using fb internal random testing vs CPU ops. Reviewed By: hyuen Differential Revision: D29706346 fbshipit-source-id: c9f4591a2cc6205e4b7e57a363ba0a6306fdddd5	2021-07-21 10:23:30 -07:00
Eli Uriegas	716567504c	Revert D29801875: .github: Clone pytorch to separate directory Test Plan: revert-hammer Differential Revision: D29801875 (`a152c12d7b`) Original commit changeset: 71a3c7c949e5 fbshipit-source-id: 85175a9933d1e33117b1461d5a760e1a79f60047	2021-07-21 10:19:28 -07:00
Supriya Rao	ea8abcf76e	[quant] Remove calls to .item() for fake_quant_on (#61921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61921 For GPU training, the fake_quant_on tensors are present on the GPU and the .item() calls incur a GPU->CPU copy to access the tensor element. This call can prove expensive and hurt the performance during training as the `item()` and `local_scalar_dense()` calls take up 11% of the total CPU execution time. The solution here is to access the tensor on the GPU without a copy. Individual op benchmarks show a 33% speedup just by removing the `.item()` calls Profiler Before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fused_moving_avg_obs_fake_quant 5.61% 1.538ms 100.00% 27.421ms 548.425us 978.208us 3.42% 28.575ms 571.501us 50 aten::_fused_moving_avg_obs_fq_helper 27.63% 7.576ms 94.39% 25.883ms 517.668us 6.536ms 22.87% 27.597ms 551.937us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 11.07% 3.037ms 21.54% 5.905ms 118.103us 9.549ms 33.42% 9.549ms 190.978us 50 aten::_aminmax 19.39% 5.317ms 27.44% 7.524ms 150.484us 8.683ms 30.38% 8.683ms 173.651us 50 aten::item 4.49% 1.232ms 11.12% 3.051ms 61.011us 1.058ms 3.70% 2.829ms 56.579us 50 aten::_local_scalar_dense 6.63% 1.818ms 6.63% 1.818ms 36.363us 1.771ms 6.20% 1.771ms 35.419us 50 aten::empty 5.76% 1.579ms 5.76% 1.579ms 15.792us 0.000us 0.00% 0.000us 0.000us 100 aten::as_strided 2.29% 628.399us 2.29% 628.399us 6.284us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 7.56% 2.073ms 17.13% 4.696ms 31.310us 0.000us 0.00% 0.000us 0.000us 150 aten::empty_strided 9.57% 2.623ms 9.57% 2.623ms 17.489us 0.000us 0.00% 0.000us 0.000us 150 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.421ms Self CUDA time total: 28.575ms ``` After ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fused_moving_avg_obs_fake_quant 6.59% 1.240ms 100.00% 18.820ms 376.396us 490.272us 2.36% 20.745ms 414.901us 50 aten::_fused_moving_avg_obs_fq_helper 26.12% 4.916ms 93.41% 17.580ms 351.597us 2.033ms 9.80% 20.255ms 405.096us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 14.55% 2.738ms 31.09% 5.850ms 117.005us 9.968ms 48.05% 9.968ms 199.363us 50 aten::_aminmax 25.28% 4.758ms 36.21% 6.814ms 136.278us 8.253ms 39.79% 8.253ms 165.069us 50 aten::empty 7.94% 1.494ms 7.94% 1.494ms 14.944us 0.000us 0.00% 0.000us 0.000us 100 aten::as_strided 2.99% 561.785us 2.99% 561.785us 5.618us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 8.36% 1.573ms 16.53% 3.112ms 31.118us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_strided 8.17% 1.538ms 8.17% 1.538ms 15.384us 0.000us 0.00% 0.000us 0.000us 100 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.820ms Self CUDA time total: 20.745ms ``` Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jingsh Differential Revision: D29796533 fbshipit-source-id: 10abb93abd61c6ac25b8e8c114aa57b9db891918	2021-07-21 10:13:06 -07:00
Supriya Rao	b8386f5d72	[quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691 Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled) Original FQ module PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"} Fused FakeQuant module (~50% improvement in latency) PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"} Individual module benchmark result (>5x improvement in latency) ===> Baseline FakeQuantize module ``` --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.77% 1.210ms 4.92% 7.730ms 154.596us 718.528us 0.45% 9.543ms 190.862us 50 aten::fake_quantize_per_tensor_affine_cachemask 2.41% 3.792ms 4.15% 6.520ms 130.402us 8.825ms 5.58% 8.825ms 176.492us 50 aten::_aminmax 3.25% 5.105ms 4.43% 6.955ms 139.102us 8.193ms 5.18% 8.193ms 163.868us 50 aten::zeros_like 1.87% 2.939ms 6.95% 10.922ms 109.218us 5.992ms 3.79% 10.844ms 108.442us 100 aten::zeros 0.97% 1.527ms 3.11% 4.885ms 97.702us 2.383ms 1.51% 4.800ms 96.010us 50 aten::rsub 1.34% 2.106ms 2.94% 4.614ms 92.277us 2.063ms 1.30% 4.559ms 91.173us 50 aten::clamp 2.79% 4.381ms 5.42% 8.519ms 85.190us 5.385ms 3.41% 8.438ms 84.381us 100 aten::eq 11.70% 18.384ms 21.31% 33.479ms 83.280us 22.465ms 14.21% 33.310ms 82.861us 402 aten::ones 1.05% 1.656ms 2.57% 4.038ms 80.751us 2.494ms 1.58% 3.951ms 79.028us 50 aten::le 2.52% 3.955ms 4.84% 7.607ms 76.071us 4.998ms 3.16% 7.702ms 77.016us 100 aten::min 0.69% 1.087ms 2.32% 3.641ms 72.827us 1.017ms 0.64% 3.603ms 72.055us 50 aten::max 1.40% 2.195ms 4.62% 7.260ms 72.597us 2.008ms 1.27% 7.140ms 71.404us 100 aten::is_nonzero 2.68% 4.207ms 11.35% 17.829ms 71.033us 4.062ms 2.57% 17.225ms 68.625us 251 aten::detach 1.17% 1.831ms 3.65% 5.736ms 57.360us 1.680ms 1.06% 5.634ms 56.340us 100 aten::mul 3.36% 5.278ms 3.36% 5.278ms 53.862us 5.215ms 3.30% 5.215ms 53.216us 98 aten::div 3.42% 5.376ms 3.42% 5.376ms 53.759us 5.320ms 3.36% 5.320ms 53.196us 100 aten::sub 6.79% 10.672ms 6.79% 10.672ms 53.901us 10.504ms 6.64% 10.504ms 53.050us 198 aten::item 4.06% 6.380ms 12.02% 18.883ms 53.798us 6.127ms 3.87% 18.322ms 52.198us 351 aten::add 3.28% 5.147ms 3.28% 5.147ms 52.518us 5.113ms 3.23% 5.113ms 52.171us 98 aten::minimum 1.63% 2.555ms 1.63% 2.555ms 51.092us 2.585ms 1.64% 2.585ms 51.708us 50 aten::maximum 3.22% 5.065ms 3.22% 5.065ms 50.646us 5.133ms 3.25% 5.133ms 51.329us 100 aten::round 1.61% 2.529ms 1.61% 2.529ms 50.578us 2.528ms 1.60% 2.528ms 50.552us 50 aten::zero_ 1.99% 3.125ms 4.72% 7.422ms 49.481us 2.835ms 1.79% 7.269ms 48.462us 150 aten::copy_ 6.62% 10.394ms 6.62% 10.394ms 41.576us 10.252ms 6.48% 10.252ms 41.010us 250 detach 2.49% 3.905ms 2.49% 3.905ms 39.049us 3.954ms 2.50% 3.954ms 39.539us 100 aten::select 2.01% 3.154ms 2.47% 3.876ms 38.759us 3.866ms 2.44% 3.866ms 38.658us 100 aten::_local_scalar_dense 7.96% 12.503ms 7.96% 12.503ms 35.621us 12.195ms 7.71% 12.195ms 34.743us 351 aten::to 2.31% 3.625ms 4.16% 6.530ms 32.650us 4.320ms 2.73% 6.270ms 31.348us 200 aten::fill_ 3.70% 5.808ms 3.70% 5.808ms 29.039us 5.892ms 3.73% 5.892ms 29.459us 200 aten::as_strided 0.79% 1.244ms 0.79% 1.244ms 6.221us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.55% 5.579ms 3.55% 5.579ms 11.137us 0.000us 0.00% 0.000us 0.000us 501 aten::resize_ 2.36% 3.712ms 2.36% 3.712ms 12.332us 0.000us 0.00% 0.000us 0.000us 301 aten::empty_like 1.45% 2.284ms 3.68% 5.776ms 28.878us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 2.80% 4.398ms 2.80% 4.398ms 17.592us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 157.108ms Self CUDA time total: 158.122ms ``` ===> FusedFakeQuant ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ fb::fused_fake_quant 23.42% 6.408ms 100.00% 27.361ms 547.215us 7.887ms 27.20% 28.996ms 579.925us 50 aten::fake_quantize_per_tensor_affine 4.25% 1.162ms 27.65% 7.565ms 151.298us 686.176us 2.37% 10.217ms 204.336us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 14.11% 3.860ms 23.40% 6.403ms 128.068us 9.531ms 32.87% 9.531ms 190.612us 50 aten::_aminmax 20.57% 5.628ms 27.47% 7.515ms 150.305us 8.218ms 28.34% 8.218ms 164.367us 50 aten::item 3.65% 999.522us 10.27% 2.810ms 56.202us 931.904us 3.21% 2.674ms 53.481us 50 aten::_local_scalar_dense 6.62% 1.811ms 6.62% 1.811ms 36.212us 1.742ms 6.01% 1.742ms 34.843us 50 aten::empty 10.85% 2.969ms 10.85% 2.969ms 14.843us 0.000us 0.00% 0.000us 0.000us 200 aten::as_strided 1.92% 524.365us 1.92% 524.365us 5.244us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 6.48% 1.774ms 14.62% 4.000ms 26.670us 0.000us 0.00% 0.000us 0.000us 150 aten::empty_strided 8.14% 2.226ms 8.14% 2.226ms 14.842us 0.000us 0.00% 0.000us 0.000us 150 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.361ms Self CUDA time total: 28.996ms ``` Test Plan: python test/test_quantization.py TestFusedObsFakeQuantModule Imported from OSS Reviewed By: vkuzo Differential Revision: D29706889 fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1	2021-07-21 10:13:04 -07:00
Supriya Rao	afdca41bab	[quant] Add a new fused MovingAvg Obs + FakeQuant operator (GPU) (#61589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61589 Custom GPU implementation that does the observer + calculate qparams calculation on GPU. It calls the aten fake_quant_per_tensor/channel functions to perform the fake quant step. Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: vkuzo Differential Revision: D29682761 fbshipit-source-id: 373a50f88481b7e5b4d9e65d84a6c174bb277dd4	2021-07-21 10:13:02 -07:00
Supriya Rao	92d3391fb1	[quant] Add a new fused MovingAvg Obs + FakeQuant operator(CPU) (#61570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61570 Fused operator that computes moving average min/max values (in-place) of the input tensor and fake-quantizes it. It expects the qmin/qmax values to reflect the range of the quantized tensor (instead of reduce_range) Motivation for adding this operator is for performance reasons, since moving the computation from python to C++/CUDA can increase the performance of QAT. Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: vkuzo Differential Revision: D29682762 fbshipit-source-id: 28e4c50e77236d6976fe4b326c9a12103ed95840	2021-07-21 10:11:41 -07:00
Michael Carilli	403f59701c	Changes default DDP behavior to divide sparse grad by world size before allreduce, not after (#61814 ) Summary: I appreciate https://github.com/pytorch/pytorch/pull/61379, which restores the fusion of div-by-world-size and copy-to-allreduce-buffer for dense gradients. But i noticed in the wake of https://github.com/pytorch/pytorch/pull/61379 there's misaligned treatment of dense and sparse gradients. Specifically, dense gradients are dived by world size before the allreduce, and sparse gradients are dived by world size after the allreduce. On paper you wouldn't expect that to matter, but for cluster-scale DDP training with amp gradient scaling and allreduces of FP16 grads, we've noticed several cases where postdividing grads by world size caused nonconvergence while predividing worked. I'm not aware of any cases where the reverse was true. This PR changes the treatment of sparse gradients to match the treatment of dense gradients (both will be dived by world size before allreduce). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61814 Reviewed By: mrshenli Differential Revision: D29772444 Pulled By: rohan-varma fbshipit-source-id: 033a17d5c019511889d908876282c6624fb26a2d	2021-07-21 09:54:53 -07:00
Thomas J. Fan	17d743ff04	ENH Adds test and docs for dropout for no batch dims (#61911 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 I think `Dropout` is already tested in `test_Dropout` for no batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61911 Reviewed By: albanD Differential Revision: D29810928 Pulled By: jbschlosser fbshipit-source-id: 7716a1a808e9e34aae43573f38706212552afbb4	2021-07-21 09:07:10 -07:00
Maksim Levental	06df33857b	fix adapative_avg_pool (#61851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61851 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29812559 Pulled By: makslevental fbshipit-source-id: ac54166aaec63992748ea3299c3144ee107b24f4	2021-07-21 08:42:26 -07:00
Vitaly Fedyunin	33db828e52	Revert D29647586: [jit] Renamed prim::Concat as prim::VarConcat Test Plan: revert-hammer Differential Revision: D29647586 (`db11619901`) Original commit changeset: cdd34ea5a3c9 fbshipit-source-id: bab5ac4ed67a00ac151fe39463aa3fb56897d7f4	2021-07-21 08:28:26 -07:00
Thomas J. Fan	48af9de92f	ENH Enables No-batch for *Pad1d Modules (#61060 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 This PR adds a `single_batch_reference_fn` that uses the single batch implementation to check no-batch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61060 Reviewed By: mrshenli Differential Revision: D29739823 Pulled By: jbschlosser fbshipit-source-id: d90d88a3671177a647171801cc6ec7aa3df35482	2021-07-21 07:12:41 -07:00
Calvin McCarter	bdf439a958	Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982 ) Summary: Signed-off-by: Calvin McCarter <calvin@lightmatter.co> Fixes https://github.com/pytorch/pytorch/issues/60981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982 Reviewed By: albanD Differential Revision: D29810547 Pulled By: jbschlosser fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1	2021-07-21 06:45:45 -07:00
Raghavan Raman	db11619901	[jit] Renamed prim::Concat as prim::VarConcat (#61498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61498 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29647586 Pulled By: navahgar fbshipit-source-id: cdd34ea5a3c986350a813be17e7d428844ea4cbf	2021-07-20 19:30:00 -07:00
Raghavan Raman	7fbdc86aec	[jit] Removed a local function to check for dominators and used the one added to Node class (#60909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60909 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29441864 Pulled By: navahgar fbshipit-source-id: 362bd462fa70256dd1f8b05756a76da0cb3d4b76	2021-07-20 19:29:58 -07:00
Raghavan Raman	429908e540	[jit] Updated the concat common inputs elimination pass to use the variadic cat op instead of aten::cat (#60908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60908 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29441865 Pulled By: navahgar fbshipit-source-id: 2ab08168102eff1f43667ca418bdd94bb2df562a	2021-07-20 19:29:57 -07:00
Raghavan Raman	53668f8bf6	[jit] Added an API to remove list mutations and replace with variadic cat until fixed point (#60776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60776 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29406099 Pulled By: navahgar fbshipit-source-id: e2e69eb6ebff3bc6e25d80f46ce118e52f557fb6	2021-07-20 19:29:55 -07:00
Raghavan Raman	0cfcf68aa5	[jit] Added special handling for prim::ListConstruct while checking for may alias inputs (#60775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60775 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29406101 Pulled By: navahgar fbshipit-source-id: 9b8a4050167750610400637e7de48ffa8727051a	2021-07-20 19:29:53 -07:00
Raghavan Raman	4dd04a8bbe	[jit] Handled cases when input list to cat is mutated after cat using AliasDb (#60774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60774 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29406100 Pulled By: navahgar fbshipit-source-id: af6afca65881c18c51b482eb63898a0f1c94d591	2021-07-20 19:28:42 -07:00
Nikita Shulga	604f503d30	Revert D29794958 + compilation fix (#61937 ) Summary: This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `\|\|` ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937 Reviewed By: albanD Differential Revision: D29805941 Pulled By: malfet fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9	2021-07-20 18:14:45 -07:00
Eli Uriegas	a152c12d7b	.github: Clone pytorch to separate directory (#61932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61932 Clones pytorch to a separate directory for each run so that they do not overlap with each other Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D29801875 Pulled By: seemethere fbshipit-source-id: 71a3c7c949e5aeacf033ae1fc9aaef13b42833b6	2021-07-20 17:30:30 -07:00
Ivan Kobzarev	7cbb7c6d2e	[vulkan] Make vulkan ops selective (#58332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58332 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28454976 Pulled By: IvanKobzarev fbshipit-source-id: 445c1f326be76e3530a4884aa5fe749d636e0ae5	2021-07-20 16:26:55 -07:00
Ivan Kobzarev	73fbf43684	[vulkan] Fix asserts (#61495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61495 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D29647357 Pulled By: IvanKobzarev fbshipit-source-id: cb4ba15f28625ea6e667883c9a2d31eba48b6f37	2021-07-20 16:07:13 -07:00
Nikita Shulga	22fff61f06	Revert D29794958: [pytorch][PR] changing trapz to trapezoid Test Plan: revert-hammer Differential Revision: D29794958 (`95cec8f4fa`) Original commit changeset: 60b9c07efd47 fbshipit-source-id: 2dcda2d62e01c2521a86ae5ed8246cfb686d3f64	2021-07-20 16:00:46 -07:00
Nikita Shulga	e067960243	lint_setup should not require elevated privileges (#61798 ) Summary: s/pip/pip3/ (because unversioned pip can reference either pip2 or pip3 depending on setup) Always invoke `pip install` with user option (otherwise, unless one is using conda environment, it will try to install in system folder, which should not be writable to regular users) Do not install shellcheck in `/usr/bin`, but rather rely on `~/.local/bin` and add it to the PATH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61798 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29747286 Pulled By: malfet fbshipit-source-id: 30cb51fe60b5096b758f430d1c51465205532a19	2021-07-20 15:53:12 -07:00
Marjan Fariborz	994434ad16	Adding complex number support for all_to_all/scatter (#61299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61299 Modifying all_to_all and scatter to support complex numbers as well as float numbers. Test Plan: buck run //caffe2/test/distributed:distributed_gloo_fork -- test_name --print-passing-details --run-disabled Reviewed By: wanchaol Differential Revision: D29563938 fbshipit-source-id: 59e436b3fa1aee3d5195cbcffd39587e642c76b9	2021-07-20 15:45:34 -07:00
rusty1s	457a0b63bf	use `torch.bucketize` in`to_sparse_csr` implementation (+ additional tests) (#61340 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57381 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61340 Reviewed By: bhosmer Differential Revision: D29601393 Pulled By: cpuhrsch fbshipit-source-id: 4ca1f013d96e8716f0e658e0cd685d9aa0d98a5c	2021-07-20 15:44:25 -07:00
Kevin Tse	95cec8f4fa	changing trapz to trapezoid (#61475 ) Summary: This PR resolves issue https://github.com/pytorch/pytorch/issues/52606 while also adding support for complex number Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/61616 * https://github.com/pytorch/pytorch/issues/61615 * https://github.com/pytorch/pytorch/issues/61475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61475 Reviewed By: mruberry Differential Revision: D29794958 Pulled By: NivekT fbshipit-source-id: 60b9c07efd47fd85b9c8178768fc7828d7b57d29	2021-07-20 15:25:55 -07:00
Jane (Yuan) Xu	86715623dd	Adding super calls to JIT test case setUp and tearDown (#61922 ) Summary: This issue was surfaced when adding this issue: https://github.com/pytorch/pytorch/issues/61655 did not manage to skip the appropriate test case. I then investigated and realized it was because the setUp code that does the test disabling is not called because another defined setUp overrode the parent class' setUp. I am not sure if that was intentional--if so we would have to adopt the child class' code to call the check_if_enable function in common_utils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61922 Reviewed By: ejguan Differential Revision: D29798716 Pulled By: janeyx99 fbshipit-source-id: d31b664e48507d69de14574ff5e6ecf1d41ae24d	2021-07-20 15:08:44 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Howard Huang	e8d2916b84	Add faulty tensorpipe implementation (#61421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61421 This PR adds the faulty tensorpipe agent implementation and replaces all faulty process group agent tests with it. The faulty tensorpipe agent code is very similar to that of faulty process group agent. It allows the user to fail or delay certain types of rpc messages, which is used in the faulty agent tests. These changes are needed to deprecate the process group rpc backend. Summary of changes: - Add faulty tensorpipe agent class - Update tensorpipe pipeWrite function to allow to be overwritten and add delay - Update test backend registry and faulty agent tests to use the FAULTY_TENSORPIPE_AGENT backend. This effects all faulty agent tests, here a few of them as sample commands: `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_verify_backend_options` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_no_faulty_messages` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_builtin_remote_message_dropped_timeout` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29773739 Pulled By: H-Huang fbshipit-source-id: 6b2bc366735d70b79943d4207f454bc9555bbf5f	2021-07-20 13:54:30 -07:00
Richard Barnes	d856914c57	Fix missing braces (#61745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61745 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717538 fbshipit-source-id: ed0ff4fb6a72b701bf6d36ebde343672356a916a	2021-07-20 13:32:38 -07:00
Richard Barnes	f78142b68d	Modernize emplace (#61742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61742 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717433 fbshipit-source-id: 93996388780862e90ab4e697508407091e8e763b	2021-07-20 13:31:19 -07:00
Han Guangyun	2c2a084012	approx 100x acceleration for parse_kineto_results (#60432 ) Summary: Fixes https://github.com/pytorch/kineto/issues/308, https://github.com/pytorch/pytorch/issues/58983 maybe related Pull Request resolved: https://github.com/pytorch/pytorch/pull/60432 Reviewed By: ilia-cher Differential Revision: D29715257 Pulled By: gdankel fbshipit-source-id: 7c94d1bb00b609f502db7aa9d9a447ab09645e6a	2021-07-20 13:21:49 -07:00
Elton Leander Pinto	4567a50b2a	Enable clang-tidy on master (#61689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61689 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29767984 Pulled By: 1ntEgr8 fbshipit-source-id: 658355da274ada41e01ed2772a03a701b90fbbab	2021-07-20 12:55:12 -07:00
mingfeima	8b88c24670	add channels last support for thnn_conv2d (non-dilated) (#49582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49582 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007050 Pulled By: VitalyFedyunin fbshipit-source-id: 1289e0687c2459dd4eb8e4ba2efc8266397cfe5f	2021-07-20 12:50:24 -07:00
Elton Leander Pinto	91bc285084	Fix clang-tidy error in pre-commit script (#61918 ) Summary: Fixes a clang-tidy error in the git-pre-commit script. See log below for the error it fixes. ``` Running pre-commit flake8 Running pre-commit clang-tidy usage: clang_tidy [-h] [-e CLANG_TIDY_EXE] [-g GLOB] [-x REGEX] [-c COMPILE_COMMANDS_DIR] [--diff-file DIFF_FILE] [-p PATHS [PATHS ...]] [-n] [-v] [-q] [--config-file CONFIG_FILE] [--print-include-paths] [-I INCLUDE_DIR] [-s] [--disable-progress-bar] [extra_args [extra_args ...]] clang_tidy: error: unrecognized arguments: -j ``` It gets rid of the redundant binary check because `tools.linter.clang_tidy` already does this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61918 Test Plan: Run `tools/git-pre-commit`. It should not show a clang-tidy error. Reviewed By: driazati Differential Revision: D29796383 Pulled By: 1ntEgr8 fbshipit-source-id: b804b0170747f04e84d21e03d1c4985748d78cf2	2021-07-20 12:40:56 -07:00
Dmytro Dzhulgakov	f6446802c7	Revert D29783943: [pytorch][PR] add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma Test Plan: revert-hammer Differential Revision: D29783943 (`513c40cb1a`) Original commit changeset: 40cebe829720 fbshipit-source-id: 5276dea572f1286dad7b7caa69ecc2f369ec13ff	2021-07-20 12:33:52 -07:00
Andrew Gu	c2cc6a9396	Add generic join unit tests (#61786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786 This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746646 Pulled By: andwgu fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668	2021-07-20 12:13:05 -07:00
Meghan Lele	1c80b5220b	`nll_loss_forward`: port to structured kernel (#61443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61443 For more information, see #55070. This PR also adds a new type, `OptionalTensorRef` as a replacement for `c10::optional<Tensor>&` in order to avoid the reference count manipulations that are inevitable with the latter. I have confirmed using Godbolt/Compiler Explorer that this class does indeed avoid manipulating the reference count of the `intrusive_ptr` inside the `Tensor` it refers to: 1. [P429709479](https://www.internalfb.com/phabricator/paste/view/P429709479) - Given a `const Tensor&` in scope, an `OptionalTensorRef` can be constructed without bumping refcount. 2. [P429709883](https://www.internalfb.com/phabricator/paste/view/P429709883) - Given an `OptionalTensorRef`, a `const Tensor&` can be produced without bumping refcount. 3. [P429710335](https://www.internalfb.com/phabricator/paste/view/P429710335) - When `OptionalTensorRef` is destructed, the refcount should not be decremented. 4. [P429769525](https://www.internalfb.com/phabricator/paste/view/P429769525) - `OptionalTensorRef` can be assigned without refcount manipulation. 5. [P429769882](https://www.internalfb.com/phabricator/paste/view/P429769882) - `OptionalTensorRef` can be move assigned without refcount manipulation. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29780666 Pulled By: SplitInfinity fbshipit-source-id: 7af157215300e9254d635433cbd583f7329fe064	2021-07-20 11:45:44 -07:00
Zhengxu Chen	f0df0207ec	[jit] Arithmetic simplification for integers. (#61444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61444 Add a mini pass to merge arithmetic nodes like (((x - 1) + 2) * 1) - 1. Issue #60913 Test Plan: python test/test_jit.py TestPeephole.test_peephole_arith Imported from OSS Reviewed By: eellison Differential Revision: D29630614 fbshipit-source-id: 08ac64cee39070401f9ff9163d309f20ff53c5ac	2021-07-20 11:35:42 -07:00
Pritam Damania	d2abfc547b	Add ShardedTensorMetadata for ShardedTensor. (#61683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61683 This PR adds a consolidated metadata field (ShardedTensorMetadata) which has all the necessary global metadata for a ShardedTensor. ghstack-source-id: 133847517 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: wanchaol Differential Revision: D29703719 fbshipit-source-id: 567279e46c787a88ef3310e4dce6fd2ad7631c62	2021-07-20 11:28:13 -07:00
Kurt Mohler	87334c40a7	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: mrshenli Differential Revision: D29774486 Pulled By: albanD fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542	2021-07-20 10:55:43 -07:00
jiayisun	513c40cb1a	add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: ejguan Differential Revision: D29783943 Pulled By: ezyang fbshipit-source-id: 40cebe8297207669d1ca430ed1d1e81dda5a0c45	2021-07-20 10:30:04 -07:00
Xiong Wei	45751e0b34	Support integral target for the backward of nn.SmoothL1Loss (#61112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58816 - enhance the backward of `nn.SmoothL1Loss` to allow integral `target` - add test cases in `test_nn.py` to check the `input.grad` between the integral input and its floating counterpart. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61112 Reviewed By: mrshenli Differential Revision: D29775660 Pulled By: albanD fbshipit-source-id: 544eabb6ce1ea13e1e79f8f18c70f148e92be508	2021-07-20 10:24:03 -07:00
Richard Barnes	59a5312ce6	Modernize fix deprecated header (#61736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61736 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716965 fbshipit-source-id: 314c2b557c240ac16bbfab114ab764beb189e78a	2021-07-20 10:06:11 -07:00
Richard Barnes	5a04bd8723	Modernize some loops in torch (#61737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61737 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716813 fbshipit-source-id: 21f9716bead4e0e913406e681c55d1956327e6af	2021-07-20 10:04:54 -07:00
lezcano	65616184bc	[Docs] Bundle of errata and small corrections / improvements for torch.linalg docs (#61578 ) Summary: This PR bundles a number of errata detected in the linalg docs over the last few weeks. - Simpler Cholesky deprecation rule - Remove repeated consecutive words - Correct cond with rcond in lstsq - Correct examples of lstsq - More concise examples - Use the names of the inputs / outputs in the variables of the examples Pull Request resolved: https://github.com/pytorch/pytorch/pull/61578 Reviewed By: mrshenli Differential Revision: D29757988 Pulled By: mruberry fbshipit-source-id: a740a64826c065c1d7c1b8b498364d147008d76d	2021-07-20 09:58:09 -07:00
Freey0	a0c9d70fba	bitwise_and: Port to structured (#60813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60813 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449374 Pulled By: ezyang fbshipit-source-id: d7e236ad841dcb9d5914352d117a34b10894bb91	2021-07-20 09:01:41 -07:00
Freey0	875d63ed04	bitwise_xor: Port to structured (#60812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60812 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449372 Pulled By: ezyang fbshipit-source-id: 016d2012f64486c2490ff319e753b0d054dccf2c	2021-07-20 09:01:40 -07:00
Freey0	ce8aeefbf4	bitwise_or: Port to strucutred (#60811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60811 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449370 Pulled By: ezyang fbshipit-source-id: ac176985b0141a55807ba909d7342eb35b1dc28f	2021-07-20 09:00:20 -07:00
johnlu	f59ac5abc8	Add thread local state guards in autograd engine hooks. (#60067 ) Summary: The thread local state of backward thread is not aligned to the GraphTask's `thread_local_` when calling the hooks in backward. This is required for profiling the statistics c10d operation of `DistributedDataParallel` module. Is there any concern to add the thread local state guard when calling the hooks in backward? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/60067 Reviewed By: ezyang Differential Revision: D29654599 Pulled By: albanD fbshipit-source-id: 656c4f91017184fd40f1a184de24757a13387e37	2021-07-20 07:41:49 -07:00
Jiewen Tan	641f6ef8a7	Implement IMethod::getArgumentNames() (#61856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61856 This diff did the following few things: 1. It implemented IMethod::getArgumentNames() for all IMethod's subclasses. 2. It refactors PyTorchDeployPredictor to use IMethod for model executions. Test Plan: [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchPredictor Reviewed By: wconstab Differential Revision: D29648756 fbshipit-source-id: e047345f26ce495a5d74d8063f7f8edc32a1b13c	2021-07-19 23:16:48 -07:00
Eddie Yan	42d6543c7b	[bc-breaking] Dispatch index_put with boolean mask argument to masked_fill (#61612 ) Summary: https://github.com/pytorch/pytorch/issues/57515 Based on ngimel 's branch, with a few tweaks to determine when to copy value tensors to device memory/additional tests. bc-breaking note: Previously, if in `x[index]=value` `value` was a 0-d tensor with device different from `x`'s device, it resulted in a RuntimeError. Now this case is handled by copying `value` to the correct device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61612 Reviewed By: mrshenli Differential Revision: D29753491 Pulled By: ngimel fbshipit-source-id: 3fba14f4c2b9b136b50af020f9c1eda88f7373b0	2021-07-19 22:53:14 -07:00
Peter Bell	018dc4193e	Factor vector intrinsics out of SumKernel.cpp (#61483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61483 This will make it simpler to support AVX512 which is upcoming in #56992, see https://github.com/pytorch/pytorch/pull/56992#discussion_r667060280 for reference. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29753536 Pulled By: ngimel fbshipit-source-id: 03ae66cdc01a3679c67214468e2bdf93b15c3bc2	2021-07-19 21:49:01 -07:00
Peter Bell	c44d9d9f70	Use cascade-summation to improve nansum accuracy (#61082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61082 Fixes #59415 This implements nansum as a new `LoadPolicy` for the existing sum functions. So, it's using the more accurate cascade-sum algorithm. I've also expanded `test_nansum` to cover the four special cases of the sum algorithm (inner/outer reduction; vectorized or scalar). Nansum performance comparison ----------------------------- For float sums, contiguous reductions are as much as 10x faster and discontiguous sums are ~1.8x faster (more for small shapes due to TensorIterator overheads). \| Shape \| Dim \| Master Contiguous (us) \| This PR Contiguous (us) \| Master Discontiguous (us) \| This PR Discontiguous (us) \| \|-------------:\|-----\|:----------------------:\|:-----------------------:\|:-------------------------:\|:--------------------------:\| \| 10, 1000 \| 0 \| 74.9 \| 2.02 \| 75.6 \| 6.41 \| \| \| 1 \| 8.24 \| 1.8 \| 8.28 \| 5.24 \| \| 100, 1000 \| 0 \| 134 \| 7.55 \| 130 \| 43.2 \| \| \| 1 \| 70.5 \| 7.01 \| 71.5 \| 40.6 \| \| 1000, 1000 \| 0 \| 726 \| 69.2 \| 737 \| 403 \| \| \| 1 \| 702 \| 51.0 \| 709 \| 404 \| \| 10000, 1000 \| 0 \| 15,300 \| 2,470 \| 18,200 \| 10,400 \| \| \| 1 \| 7,200 \| 1,160 \| 7,470 \| 4,440 \| \| 100000, 1000 \| 0 \| 163,000 \| 28,000 \| 199,000 \| 131,000 \| \| \| 1 \| 70,700 \| 13,500 \| 75,700 \| 44,200 \| Sum performace comparison ------------------------- For float sums, performance is unchanged to within measurement precision: \| Shape \| Dim \| Master Contiguous (us) \| This PR Contiguous (us) \| Master Discontiguous (us) \| This PR Discontiguous (us) \| \|-------------:\|-----\|:----------------------:\|:-----------------------:\|:-------------------------:\|:--------------------------:\| \| 10, 1000 \| 0 \| 1.92 \| 2.01 \| 4.2 \| 4.49 \| \| \| 1 \| 1.68 \| 1.68 \| 2.79 \| 2.75 \| \| 100, 1000 \| 0 \| 6.52 \| 7.07 \| 26.9 \| 27.3 \| \| \| 1 \| 5.91 \| 5.66 \| 16.8 \| 16.9 \| \| 1000, 1000 \| 0 \| 55.6 \| 58.6 \| 256 \| 254 \| \| \| 1 \| 41.0 \| 41.2 \| 150 \| 147 \| \| 10000, 1000 \| 0 \| 1,370 \| 1,650 \| 8,070 \| 8,020 \| \| \| 1 \| 908 \| 845 \| 3,100 \| 2,980 \| \| 100000, 1000 \| 0 \| 24,700 \| 24,700 \| 90,900 \| 91,000 \| \| \| 1 \| 12,500 \| 12,100 \| 31,500 \| 31,800 \| Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29753523 Pulled By: ngimel fbshipit-source-id: 28095ac39e4a07ff878775c98f7a7815d9a4e457	2021-07-19 21:47:43 -07:00
Freey0	bf1c9aaa79	logit_backward: Port to structured (#60817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60817 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449376 Pulled By: ezyang fbshipit-source-id: e6f793300488370f50a97db58f0400c557ee64e5	2021-07-19 21:23:05 -07:00
Freey0	b8686b42d8	tanh_backward: Port to structured (#60816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60816 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449375 Pulled By: ezyang fbshipit-source-id: 93b70341fc6a2a42056fef74d6e5d81ec34e9da2	2021-07-19 21:23:03 -07:00
Freey0	8c42d7ad07	sigmoid_backward: Port to structured (#60815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60815 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449371 Pulled By: ezyang fbshipit-source-id: e68c05cc90446e86d50b67d8346f145bf13ed207	2021-07-19 21:23:01 -07:00
Freey0	11cc179366	xlogy: Port to structured (#60814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60814 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449373 Pulled By: ezyang fbshipit-source-id: a37499cd4fabff80f848627def7dd500364b8a22	2021-07-19 21:21:54 -07:00
Michael Carilli	9fb6b40f3e	Makes a streaming backward test try gradient stealing more directly (#60065 ) Summary: Closes https://github.com/pytorch/pytorch/issues/59846. https://github.com/pytorch/pytorch/issues/59846 is likely paranoia, and some of the test_streaming_backward_* in test_cuda.py already use gradient stealing (ie, they start with `.grad`s as None before backward). Regardless, this PR augments one of the tests to stress gradient stealing a bit more directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60065 Reviewed By: mrshenli Differential Revision: D29779518 Pulled By: ngimel fbshipit-source-id: ccbf278543c3adebe5f4ba0365b1dace9a14da9b	2021-07-19 20:39:55 -07:00
Jerry Cai	873cc7a46d	Support 3 argument variant of the getattr() call where the third arg is the default return value (#61599 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/56909 Note the emitted code for such a call will either be a) getattr() call with first two args if the attribute name (which must be a string literal) is determined to be valid based on the hasAttr() result, or b) just the AST node for the default value (the 3rd arg) alone with no getattr call at all. Test code: ``` import torch import numpy as np class Shape: def __init__(self): self.center = 1.0 def f(x): s = Shape() return getattr(s, "missing", []) y = torch.jit.script(f) print(y.graph) ``` Output: ``` graph(%x : Tensor): %s.1 : __torch__.Shape = prim::CreateObject() %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8 %4 : Tensor[] = prim::ListConstruct() return (%4) ``` Another example: ``` import torch class Shape: def __init__(self): self.center = 1.0 def f(x): s = Shape() y = getattr(s, "center") w : list[float] = [1.0] z = getattr(s, "missing", w) z.append(y) return z y = torch.jit.script(f) print(y.graph) --- output --- graph(%x : Tensor): %5 : float = prim::Constant[value=1.]() # ts.py:12:23 %s.1 : __torch__.Shape = prim::CreateObject() %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8 %center : float = prim::GetAttr[name="center"](%s.1) %w.1 : float[] = prim::ListConstruct(%5) %11 : float[] = aten::append(%w.1, %center) # ts.py:14:4 return (%w.1) ``` Fixes #{56969} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61599 Reviewed By: ZolotukhinM Differential Revision: D29776058 Pulled By: jerryzhenleicai fbshipit-source-id: 76333bd54002e08a064677c1f287115a80cc7c8e	2021-07-19 20:04:21 -07:00
Michael Carilli	ffd2e602f4	[CUDA graphs] Make sure graph mempool cudaMalloc_count decrement pairs with cudaFree for all allocations (#61567 ) Summary: Graphs mempools aren't deleted until all their allocations are cudaFreed. `PrivatePool::cudaMalloc_count` tracks the number of outstanding (not-yet-cudaFreed) allocations. https://github.com/pytorch/pytorch/pull/44742 moves cudaFree to [release_block](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1160), while the `cudaMalloc_count` decrement (if needed) remains in a caller ([release_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1177)). But I noticed there's also a path ([release_available_cached_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1094)) that calls `release_block` without calling `release_blocks`, in other words, it calls cudaFree but dodges any potential `cudaMalloc_count` decrement. In practice, the way the code is currently organized, I don't _think_ this second path can cause the pool to become a zombie whose `cudaMalloc_count` will never reach zero (I think this could only happen if you call `release_available_cached_blocks` on a private pool, and the only way it would be called on a private pool is if capture is underway, and if capture is underway, the cudaFree call will hard error). Regardless, I feel much more comfortable keeping the cudaMalloc_count decrement right next to the cudaFree. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61567 Reviewed By: mrshenli Differential Revision: D29765198 Pulled By: ezyang fbshipit-source-id: bcbeed656c3e0d101112aa470d8a098c73a011b1	2021-07-19 19:22:18 -07:00
Yukio Siraichi	208d06ca8c	Port other comparison ops: `ne`, `lt`, `gt`, `le`, `ge` to structured kernels. (#60942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60942 Tracking Issue: #55070 This PR applies the same transformation of `eq` to the other comparison ops: `ne`, `lt`, `gt`, `le`, and `ge`. Macros for crating meta and impl functions are used (since the checks they have are the same). Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29509868 Pulled By: ezyang fbshipit-source-id: 6a1ed1d93d08884c9e09d3f419037533a235d68c	2021-07-19 19:14:12 -07:00
Yukio Siraichi	97327137ba	Port `eq` kernel to structured kernels. (#60177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60177 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29509871 Pulled By: ezyang fbshipit-source-id: ad81bb49c46edc81c705d12108b98c5ffaaddf92	2021-07-19 19:13:09 -07:00
Stephen Jia	64ac428889	[vulkan] Adaptive local work group size (#61170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61170 Instead of using a fixed local work group size of {4,4,4}, adjust the size based on the global size in order to minimize the number of inactive invocations. ## Perf improvements from this change On aloha portal devices, in conjunction with the below diff that tweaks the conv2d_pw shader to calculate a 4x4 output, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724591 fbshipit-source-id: ede896300b2be1a9578e492cb870121012886aa7	2021-07-19 18:52:19 -07:00
Stephen Jia	f324421d34	[vulkan] Calculate a 4x4 output tile for each invocation in conv2d_pw (#60760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60760 A simple optimization to the `conv2d_pw` shader that makes each invocation calculate a 4x4 output tile instead of a single output texel. This results in better memory reuse and subsequently a pretty significant performance win for models similar to the MobileNets. ## Perf improvements from this change On aloha portal devices, in conjunction with the above diff that introduces adaptive work group sizes, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724590 fbshipit-source-id: e742286b01bf566dc6378677be55409b7faa8cfb	2021-07-19 18:52:18 -07:00
Stephen Jia	a1b5025ecd	[vulkan] Convolution op cleanup (#60759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60759 Remove unused convolution implementations and refactor convolution op code to make this file easier to maintain. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724592 fbshipit-source-id: cb509fa1cd68089f78188bfb3c866aabc9b0cbdb	2021-07-19 18:52:16 -07:00
Stephen Jia	cacab7e9d6	[vulkan] Reduce submission rate to save CPU cycles (#60758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60758 Further tweak the submission rate of ops. Before in D28293756 (`bc0965ac85`), the submission rate was set as high as possible in order to prioritize performance. However, in practice (i.e. when running the model in an app) the high rate of submission increases CPU usage and increases GPU contention which may regress fps. In the future it would be beneficial to devise a scheme to adaptively set the GPU submission rate. ## Perf Improvements This change doesn't really affect benchmark latency. However, through systraces it can be observed that CPU usage is reduced without too much impact on FPS/model latency. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D29062836 fbshipit-source-id: 1a0f42b49fecb80baee08cb3f1048bb35a1b5d5c	2021-07-19 18:51:04 -07:00
Michael Suo	554038c2a2	[package] merge test_torchscript into test_package_script (#61807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61807 These shouldn't be separate files, they test the same thing Differential Revision: D29748967 D29748967 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 177f40fa460d00d064dfd1f33a0b6656b214a296	2021-07-19 18:23:45 -07:00
Michael Suo	f02cfcc802	ban PyTorchStreamWriter from writing the same file twice (#61805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61805 Similar in spirit to https://github.com/pytorch/pytorch/pull/61371. While writing two files with the same name is allowed by the ZIP format, most tools (including our own) handle this poorly. Previously I banned this within `PackageExporter`, but that doesn't cover other uses of the zip format like TorchScript. Given that there are no valid use cases and debugging issues caused by multiple file writes is fiendishly difficult, banning this behavior enitrely. Differential Revision: D29748968 D29748968 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 0afee1506c59c0f283ef41e4be562f9c22f21023	2021-07-19 18:23:43 -07:00
Michael Suo	04043d681e	[package] fix storage serialization collision (#61806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61806 Currently, if you do `save_pickle` on a ScriptModule, then `save_pickle` on a tensor, this would result in a `0.storage` tensor being written twice to the zip archive. This would cause weird bugs on the serializing side (this presented as a ASAN-detected heap buffer overflow because we tried to read more memory from a tensor than we actually had). Turns out this was because when we did: ``` self.storage_context = self.script_module_serializer.storage_context() ``` it returned a new copy of the storage context, so we weren't actually assigning unique names to tensors!! This PR fixes the issue by making `(De)SerializationStorageContext` non-copyable and fixing up the parts of the bindings that returned by copy. Differential Revision: D29748969 D29748969 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: c2f89ab270e07e7a111fb35c545b5e07b804dc3c	2021-07-19 18:22:36 -07:00
jiayisun	c30048fccf	add BFloat16 support for topk on CPU (#59547 ) Summary: Added BFloat16 support for topk on CPU, and collected the benchmark data of topk for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Input: 512x512, 512x1024, 1024x512, 1024x1024 K: 5 Number of cores: 1 core, 28 cores(1 socket) For 1 core: ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 911.401 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 911.700 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1506.927 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1492.036 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1825.634 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1819.872 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 3001.459 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 2970.718 For 28 cores(1 socket): ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 146.995 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 123.423 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 105.967 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 101.498 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 128.023 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 125.172 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 129.855 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 124.556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59547 Reviewed By: mrshenli Differential Revision: D29763916 Pulled By: ezyang fbshipit-source-id: 706c7d4349ac9ebd5d63f4844fca70febcb67023	2021-07-19 16:06:24 -07:00
Jeff Daily	15210f3b82	ignore and clear not ready errors (#61554 ) Summary: Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554 Reviewed By: mrshenli Differential Revision: D29763973 Pulled By: ezyang fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9	2021-07-19 16:03:04 -07:00
Jane Xu	e68c016871	Regenerate libtorch workflow files that got lost in merge conflict (#61872 ) Summary: Forward fixes merge conflict on master: https://github.com/pytorch/pytorch/runs/3106027618 for PR https://github.com/pytorch/pytorch/issues/61774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61872 Reviewed By: dzhulgakov Differential Revision: D29775595 Pulled By: janeyx99 fbshipit-source-id: 8194dd123f166fd5f3fd1e77417e865c188f40c8	2021-07-19 15:30:13 -07:00
ndkshr	0a6d88244b	Fix grammatical errors on the PyTorch Contribution Guide (#61818 ) Summary: ## What does the PR do? - Fix grammatical errors on the PyTorch Contribution Guide page. ## Changes [Screenshots] > Note: > 1. The changes are highlighted in each screenshot. > 2. Could not load CSS while testing locally, hope that is not an issue as all the changes are made on the content. 1. ![Change1](https://user-images.githubusercontent.com/20442648/126077764-39fd8b78-524f-407d-bc39-c93167bd10a7.PNG) 2. ![Change2](https://user-images.githubusercontent.com/20442648/126077766-9dd7dc61-ef06-41d0-a7e5-cfd179ece0cd.PNG) 3. ![Change3](https://user-images.githubusercontent.com/20442648/126077767-2c2e05e4-09fc-403a-a18e-9b108651a5f8.PNG) 4. ![Change4](https://user-images.githubusercontent.com/20442648/126077769-ad755db6-3afa-457b-b95c-9f6c6281f828.PNG) 5. ![Change5](https://user-images.githubusercontent.com/20442648/126077770-a7759dee-7f90-4b9e-a07c-4dec4ca934d0.PNG) 6. ![Change6](https://user-images.githubusercontent.com/20442648/126077772-0474e58d-c0c8-4156-b56f-808d225c38e7.PNG) 7. ![Change7](https://user-images.githubusercontent.com/20442648/126077774-d48382a7-5379-49a4-a8d2-b478fabf0bf0.PNG) 8. ![Change8](https://user-images.githubusercontent.com/20442648/126077777-fd743825-8dd7-4cb9-a22c-233e5fa085a6.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61818 Reviewed By: dzhulgakov Differential Revision: D29775606 Pulled By: mrshenli fbshipit-source-id: 3f3bfdeede341f784b72dfe55da9ba8bdce1192a	2021-07-19 15:06:22 -07:00
Xue Haotian	43c5dc40c5	Port `signbit` to structured kernel (#57936 ) Summary: Port signbit to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57936 Reviewed By: mrshenli Differential Revision: D29764904 Pulled By: ezyang fbshipit-source-id: 758f5f085d0cc84af612726f667cde15d615053b	2021-07-19 15:03:10 -07:00
Dmytro Dzhulgakov	44d3267103	Remote whitespace introduced by #61438 (#61863 ) Summary: Since it's a one-character change it feels faster to fix than revert Verified with `(! git --no-pager grep -In '[[:blank:]]$' -- . ':(exclude)/contrib/' ':(exclude)third_party' \|\| (echo "The above lines have trailing spaces; please remove them"; false))` from the lint check Pull Request resolved: https://github.com/pytorch/pytorch/pull/61863 Reviewed By: ZolotukhinM Differential Revision: D29772353 Pulled By: dzhulgakov fbshipit-source-id: 33cb887f25e344b420f645a8e4dc8d0d7462e9ef	2021-07-19 14:57:10 -07:00
Richard Zou	26d17ddc9f	Exclude wrapper tensors from functorch in the native::resize_output fastpath (#61846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61846 Related to #61485. native::resize_output has a fast path that avoids dispatching. Unfortunately, we have a number of CompositeImplicitAutograd operations that directly call out= variants of operators. These CompositeImplicitAutograd operators (e.g. torch.linalg.norm) end up calling native::resize_output. That function, combined with how functorch uses a mode-dispatch key to wrap tensors, causes silently incorrect behavior in functorch (more details are available in #61485). The very easy short-term fix is to have `native::resize_output` always dispatch on a Tensor (and skip the fast-path) if a Tensor is a functorch wrapped Tensor. More long-term fixes are proposed in the issue. Test Plan: - I checked that this change fixes torch.linalg.norm and other operators with this problem in functorch. - We're not testing functorch in pytorch/pytorch CI but we probably will in the near future. - wait for PyTorch tests. Reviewed By: ezyang Differential Revision: D29764293 Pulled By: zou3519 fbshipit-source-id: c7afcb0bd3bc77d2ba716d5b11f62830d8bdf0a9	2021-07-19 13:50:37 -07:00
Hong Xu	f912889726	Remove unnecessary Ubuntu version checks (#61738 ) Summary: PR https://github.com/pytorch/pytorch/issues/5401 missed another Ubuntu version check in `cmake/MiscCheck.cmake`. The check for available functions added by https://github.com/pytorch/pytorch/issues/5401 are already present below the code snippet that this PR deletes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61738 Reviewed By: mrshenli Differential Revision: D29757525 Pulled By: ezyang fbshipit-source-id: 7f5f9312284973481a8b8a2b9c51cc09774722e9	2021-07-19 13:04:24 -07:00
soulitzer	1b0a7f3887	Always use fast gradcheck for LayerNorm 3d_no_affine_large_feature (#61848 ) Summary: Due to the introduction of a test from https://github.com/pytorch/pytorch/pull/59987/files, slow gradcheck has been failing intermittently (timing out/getting killed). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61848 Reviewed By: mrshenli Differential Revision: D29765773 Pulled By: soulitzer fbshipit-source-id: d78bee758cab76f26ba9f54925c42d4825db9449	2021-07-19 12:33:55 -07:00
Kaige Liu	094abf5fd0	[BE] Include a unit test for Save Operator with db_options Summary: A test case that triggers db_options with the save operator is missing. Test Plan: buck test Differential Revision: D29642719 fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d	2021-07-19 12:22:59 -07:00
Richard Barnes	e389650f10	Upgrade CPUFallback for loops (#61722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61722 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29715862 fbshipit-source-id: 21e12c71e28e542abc649890f72938801d9d7d7a	2021-07-19 11:27:26 -07:00
Rohan Varma	04bd9d7577	[DDP] Add API to get model parameters in hook (#61637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61637 To support running optimizer as a communication hook, add API to retrieve the model parameters. The API returns a `dict[idx -> tensor]` where `idx` is the intra bucket index of gradient tensor and thus the same index of `perParameterTensors`. The API can be used as follows to retrieve the model parameters: ``` per_param_grad_tensors = bucket.get_per_parameter_tensors() idx_to_model_params = bucket.get_grad_index_to_variable_mapping() for grad_tensor_idx, model_param in idx_to_model_params.items(): self.assertEqual(model_param.grad, per_param_grad_tensors[grad_tensor_idx]) ``` This provides a way for comm. hook developer to retrieve model parameters within a hook. In the next diffs, we will use this to run optimizer as a DDP comm. hook. ghstack-source-id: 133768666 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29691418 fbshipit-source-id: 4bfa824768a5850f73ee330017e2bcc29ceb7edc	2021-07-19 11:24:54 -07:00
Elton Leander Pinto	66c8d21d7b	Update progress and error reporting in clang-tidy (#61672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61672 This PR adds a progress bar to clang-tidy, and updates how it threads error codes (when run in parallel). The progress bar is disabled on GHA because backspace escape codes are not supported. It also adds a `--quiet` flag to the script. Screenshot of progress bar: <img width="955" alt="Screen Shot 2021-07-14 at 3 17 11 PM" src="https://user-images.githubusercontent.com/40111357/125686114-a8a7c154-3e65-43a8-aa8f-c1fb14d51d27.png"> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29763848 Pulled By: 1ntEgr8 fbshipit-source-id: cbd352593b279f279911bc3bb8d5ed54abd5f1d5	2021-07-19 11:19:06 -07:00
Thomas J. Fan	24a6eb3fda	ENH Adds tests and docs for 2d & 3d modules that already support no batch (#61262 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61262 Reviewed By: mrshenli Differential Revision: D29660554 Pulled By: jbschlosser fbshipit-source-id: d5e3dc7096fcf8621bce4a1063d521b84092e0ca	2021-07-19 11:12:28 -07:00
XiaobingSuper	4f46943e3d	enable check trace when tracing a mkldnn model (#61241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43039, when tracing a MKLDNN model with setting check_trace=True, there has an error: RuntimeError: unsupported memory format option Preserve, this PR is to solve this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61241 Reviewed By: anjali411 Differential Revision: D29737365 Pulled By: suo fbshipit-source-id: e8f7f124bc6256f10b9d29969e0c65d332514625	2021-07-19 11:03:53 -07:00
Freey0	75b68def63	fmin has been ported to the structured kernel, removing the old implementation (#60810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60810 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449377 Pulled By: ezyang fbshipit-source-id: 0b43562d0dfe81dfa401268f1d12e0d2c3c9f420	2021-07-19 10:20:06 -07:00
Freey0	b526080d89	fmod: Port to structured (#60809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60809 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449378 Pulled By: ezyang fbshipit-source-id: 70f6fa95988f753eec4aefa60a60dddb7f3d744e	2021-07-19 10:18:57 -07:00
michael	b65ddef000	for shared-memory handles, use an atomic counter, instead of potentially colliding random numbers (#60978 ) Summary: These handles, used for shared-memory tensors, can collide. E.g. see https://github.com/pytorch/pytorch/issues/60626#issuecomment-869919018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60978 Reviewed By: mruberry Differential Revision: D29479291 Pulled By: ezyang fbshipit-source-id: 408ef1817768f007ad4795b286482809ea43467c	2021-07-19 09:56:43 -07:00
zhouzhuojie	ac5a40e068	Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808 ) Summary: There're a few convoluted logic here to fix the `benchmarks`'s import module for pytest. - On one hand, if we want to use `tools.stats.scribe` from `benchmarks`, we will need to add `benchmarks/__init__.py` - On the other hand, if we add `benchmarks/__init__.py`, it breaks how `pytest` is working on searching what is the system built `torch` instead of the local source module `../torch` - That's why we are seeing errors like ``` ImportError while loading conftest '/var/lib/jenkins/workspace/benchmarks/fastrnns/conftest.py'. benchmarks/fastrnns/__init__.py:1: in <module> from .cells import * # noqa: F403 benchmarks/fastrnns/cells.py:1: in <module> import torch torch/__init__.py:29: in <module> from .torch_version import __version__ as __version__ torch/torch_version.py:9: in <module> from .version import __version__ as internal_version E ModuleNotFoundError: No module named 'torch.version' ``` Instead, this PR changed the usage of `upload_scribe.py` back to its original form using HTTP request, and only circleci for now will continue the this path using the `python benchmarks/upload_scribe.py`, which is gated by `if [[ -z "${GITHUB_ACTIONS}" ]];` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61808 Reviewed By: seemethere Differential Revision: D29750188 Pulled By: zhouzhuojie fbshipit-source-id: 3b842b21978f2159001e9c6c1cdc96c5a0515f2e	2021-07-19 09:45:05 -07:00
Patrick	9c3346c8aa	reduce max_num_threads for complex double ops in reduce_kernel (#61438 ) Summary: reduce_kernel currently has a all-purpose MAX_NUM_THREADS of 512, which causes register spilling in various kernel instantiations for the various ops that use it as a template (ReduceLogicKernel, ReduceMinMaxKernel, ReduceMomentKernel, ReduceNormKernel, and ReduceSumProdKernel). This is a coarse first attempt at mitigating spillage by reducing max_num_threads to 256 for all complex double ops, which are by far the most common and egregious offenders, while keeping it 512 for the other normal ops, the large majority of which are fine. Besides complex double ops, the remaining kernels which exhibit lmem usage are ReduceMinMax double, long, and BFloat16; ReduceMomentKernel BFloat16, Half, float, and double; and ReduceNorm double. The proposed fix manages to eliminate lmem usage and massively improve runtime (by 3-5x) for complex double ops. All other ops are unaffected and have the same runtime; if they used lmem before, they still do now. We would still strongly recommend further testing of input shapes and ops as well as looking into if there's a cleaner approach to doing this. We tested the following ops for both complex double instantiations, as well as testing torch.max and torch.argmax with doubles to make sure they didn't break. We didn't include the double instantiations in the timing data, since they remain unchanged post-fix vs pre-fix. Timing data for the complex double ops below (all done on Nvidia Titan-V GPU): torch.mean: ![MeanTimingData](https://user-images.githubusercontent.com/22803332/125005623-0f424800-e011-11eb-864e-8419485a9c76.PNG) torch.linalg.norm: ![NormTimingData](https://user-images.githubusercontent.com/22803332/125005649-179a8300-e011-11eb-96e1-54e18c85a336.PNG) torch.sum: ![SumTimingData](https://user-images.githubusercontent.com/22803332/125005655-1b2e0a00-e011-11eb-928e-ee5941608fb2.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61438 Reviewed By: mrshenli Differential Revision: D29756863 Pulled By: ngimel fbshipit-source-id: 4c4635df58af9313966ff1df1095f7e15a39bb07	2021-07-19 09:38:22 -07:00
Jane Xu	d565b3e9ea	Migrate libtorch to GHA (#61774 ) Summary: Makes progress on https://github.com/pytorch/pytorch/issues/57686 Tested in https://github.com/pytorch/pytorch/pull/61775: periodic 11.3 libtorch: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3088529584?check_suite_focus=True 10.2: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965441 11.1: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61774 Reviewed By: samestep Differential Revision: D29745793 Pulled By: janeyx99 fbshipit-source-id: a17f561051b1e5eccf4918137a4b5df19308a716	2021-07-19 09:21:52 -07:00
Andrew Gu	3e3acf8a9a	Minor documentation fixes (#61785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61785 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746648 Pulled By: andwgu fbshipit-source-id: 435bbd8894f2ae5c814b9acd562673affea1daf6	2021-07-19 09:01:29 -07:00
Andrew Gu	813b887dad	Fix indent (#61784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61784 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746647 Pulled By: andwgu fbshipit-source-id: f42d3a0864a8291941d695a0cf575a5737cbb35c	2021-07-19 09:00:25 -07:00
cyy	a26a9f8b75	zero initialize some members and other fixes (#59915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59915 Reviewed By: soulitzer Differential Revision: D29106684 Pulled By: ezyang fbshipit-source-id: 713cbdf10866017ee715ee89ec82acb592c769b6	2021-07-19 07:36:26 -07:00
Tomasz Cheda	0263865bfe	[Docs] Fix docs for torch.chunk (#61097 ) Summary: torch.chunk may return less than the requested number of chunks silently if some undocumented division constraints are not met. The functionality that users expect is provided by another function: torch.tensor_split This has led to confusion countless times and who knows how many systems out there are fragile because of this. My changes describe the discrepancy, show an example and direct users to the usually preferred function. Issues mentioning this problem: https://github.com/pytorch/pytorch/issues/9382 https://github.com/torch/torch7/issues/617 I considered documenting the constraint for when an unexpected number of chunks may be returned (it is chunks*chunks>input.size[dim] ), so that users could quickly tell if their code may be affected. Please let me know if you think this should be in the docs or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61097 Reviewed By: heitorschueroff Differential Revision: D29660280 Pulled By: ezyang fbshipit-source-id: 675086bc8a8882c1685a50a2c083ae8dd1854384	2021-07-19 06:13:04 -07:00
CodemodService FBSourceClangFormatLinterBot	552eab7935	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29758833 fbshipit-source-id: e07673bb19f15865bf5810910224f3f37a759db7	2021-07-19 04:12:20 -07:00
Raghavan Raman	593e8f41ca	[jit] Fixed a bug in the pass that replaces cat with the variadic op (#61795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61795 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29748785 Pulled By: navahgar fbshipit-source-id: df5b84c35f007718c92a21a0b44a231e6d346918	2021-07-18 21:38:30 -07:00
Victor Quach	ff82394fc0	Apply saved tensor hooks (#60975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60975 Fixes #58512 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29466227 fbshipit-source-id: c1498d52173aceb29638b5c4f521ac05356a5958	2021-07-18 08:42:51 -07:00
Vasiliy Kuznetsov	eefbff773b	ns for fx: add utils for l2 error and cosine similarity (#61380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61380 Adds convenience wrappers for l2 error and cosine similarity to NS utils. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600354 fbshipit-source-id: 670c44a44df7f345884cacf26ed3c885edbe9977	2021-07-17 20:53:43 -07:00
Vasiliy Kuznetsov	2a2bc1fc8a	ns for fx: add fqn to results, when present (#61377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377 Both the quantization tracer and the NS tracer record `_node_name_to_scope`, which contains the mapping from node name to FQN. This PR adds the FQN information to the NS results, so that it is more convenient for users to attribute a NS result to the corresponding module in their model. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29600349 fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53	2021-07-17 20:53:41 -07:00
Vasiliy Kuznetsov	7449f49a4c	ns for fx: return results in execution order (#61360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360 By default, NS graph matching matches from the end of the graph to the start. This PR reverses the returned results so that the outputs of the NS APIs are in the order of execution, making it easier to analyze. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_results_order ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600348 fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701	2021-07-17 20:53:39 -07:00
Vasiliy Kuznetsov	2b2928c5ca	ns for fx: improve error messages for graph matching (#61359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359 Makes the error messages when graph matching easier to read for users. Test Plan: ``` // inspect the exceptions in the following two tests and verify // that they are easier to read than before python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600353 fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4	2021-07-17 20:53:38 -07:00
Vasiliy Kuznetsov	ddf6d6cc14	ns for fx: clean up override_qengines and copy TODO in tests (#61358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61358 1. changes override_qengines to require fbgemm instead, these tests are not testing any qengine specific logic so better to just run them once 2. removes a TODO about copy.deepcopy which we do not plan to address Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600352 fbshipit-source-id: 4db08f0080233ff46d7679928c83e41c5ba21ec8	2021-07-17 20:53:36 -07:00
Vasiliy Kuznetsov	cf6f5efb39	ns for fx: test case for comparing fp32 vs fp32_prepared shadow (#61357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61357 Adds a test case for comparing fp32 vs fp32_prepared in a shadow model. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600350 fbshipit-source-id: ff7518ce8a789ab7469cb22044f1d7c697e2cd04	2021-07-17 20:53:34 -07:00
Vasiliy Kuznetsov	4acd14da02	ns for fx: preserve observers and fake_quants through passes (#61323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323 Before this PR, all observers and fake quants were silently removed when adding loggers with NS. This is problematic for QAT models because we need the fake quants to run in order to properly capture intermediate outputs. This PR fixes the issue by preserving the observers throughout the passes which add loggers. In detail: * for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end * remove the places in the logger model creation code which removed observers * add unit testing that QAT numerics do not change after adding loggers Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600351 fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456	2021-07-17 20:53:33 -07:00
Vasiliy Kuznetsov	a70505cdbd	ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129 Adds support the comparing fp32 model (without quantization) to a fp32 model prepared with quantization. The main missing feature was handling conv-bn fusion, since this fusion for PTQ happens outside of quantization patterns. Adds testing for this case for comparing weights and comparing activations Adds a TODO for also handling this for shadow activations, we need to first stop removing observers in graph passes before we can add this support, will be in a future PR. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2 python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29520009 fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae	2021-07-17 20:52:23 -07:00
Horace He	e117d94e21	Wrapped create_type_hint in try/except block so that NormalizeArgs doesn't fail if create_type_hint fails (#61524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61524 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D29746106 Pulled By: Chillee fbshipit-source-id: d08c0030f40b504e8f7a61fc0ee432f1515a0e6d	2021-07-17 16:13:17 -07:00
zhouzhuojie	59ca89dca8	Fix scribe logs again (#61768 ) Summary: revert the revert of 3624d75 with additional fix in https://github.com/pytorch/pytorch/pull/61764 Got the corrent logs sent to lambda ``` ... ,"21721":"OK","21722":"OK","21723":"OK","21724":"OK","21725":"OK","21726":"OK","21727":"OK","21728":"OK","21729":"OK","21730":"OK","21731":"OK","21732":"OK","21733":"OK","21734":"OK","21735":"OK","21736":"OK","21737":"OK","21738":"OK","21739":"OK","21740":"OK","21741":"OK","21742":"OK","21743":"OK","21744":"OK","21745":"OK","21746":"OK","21747":"OK","21748":"OK","21749":"OK","21750":"OK","21751":"OK","21752":"OK","21753":"OK","21754":"OK","21755":"OK","21756":"OK","21757":"OK","21758":"OK","21759":"OK","21760":"OK","21761":"OK","21762":"OK","21763":"OK","21764":"OK","21765":"OK","21766":"OK","21767":"OK","21768":"OK","21769":"OK","21770":"OK","21771":"OK","21772":"OK","21773":"OK","21774":"OK","21775":"OK","21776":"OK","21777":"OK","21778":"OK","21779":"OK","21780":"OK","21781":"OK","21782":"OK","21783":"OK","21784":"OK","21785":"OK","21786":"OK","21787":"OK","21788":"OK","21789":"OK","21790":"OK","21791":"OK","21792":"OK","21793":"OK","21794":"OK","21795":"OK","21796":"OK","21797":"OK","21798":"OK","21799":"OK","21800":"OK","21801":"OK","21802":"OK","21803":"OK","21804":"OK","21805":"OK","21806":"OK","21807":"OK","21808":"OK","21809":"OK","21810":"OK","21811":"OK","21812":"OK","21813":"OK","21814":"OK","21815":"OK","21816":"OK","21817":"OK","21818":"OK","21819":"OK","21820":"OK","21821":"OK","21822":"OK","21823":"OK","21824":"OK","21825":"OK","21826":"OK"}} class StartProcessesTest: tests: 14 failed: 0 skipped: 0 errored: 0 run_time: 4.86 seconds avg_time: 0.35 seconds median_time: 0.01 seconds 3 longest tests: test_function_large_ret_val time: 1.55 seconds test_pcontext_wait time: 1.11 seconds test_void_function time: 1.03 seconds ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61768 Reviewed By: janeyx99 Differential Revision: D29735781 Pulled By: zhouzhuojie fbshipit-source-id: 6882e334f5108d20773ad66d5300cd37eb509ded	2021-07-16 17:56:16 -07:00
Nikita Shulga	311f1f275a	Update clang-tidy-linux64 (#61797 ) Summary: Update clang-tidy linux hash to match one build for `7ae60a49ac` by https://github.com/pytorch/test-infra/runs/3090057893 Fixes `The downloaded binary is not what was expected!` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61797 Reviewed By: zhouzhuojie Differential Revision: D29746840 Pulled By: malfet fbshipit-source-id: a7388952b04ba12f250003c32629d57b8d5ffed8	2021-07-16 17:23:21 -07:00
Charles David Hernandez	4337650c91	Fixing a bug in .to for qtensors so scale/zp move too (#61576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61576 This also fixed an issue in the empty_quantized_per_channel_affine function where specifying a device that was different from the device of scale/zp would result in a mismatched qtensor Test Plan: python test/test_quantization.py testquantizedtensor.test_per_channel_to_device Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29675461 fbshipit-source-id: 0e2ff20f0f581dae94ee01d3ceead2a620cd26b9	2021-07-16 17:16:24 -07:00
zhouzhuojie	cb6841b263	Fix ConnectionError in download_mnist (#61789 ) Summary: Fixes issues like the following error. Note that `ConnectionResetError` is a subclass of `ConnectionError`. ``` + python tools/download_mnist.py --quiet -d test/cpp/api/mnist Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ... Traceback (most recent call last): File "tools/download_mnist.py", line 93, in <module> main() File "tools/download_mnist.py", line 86, in main download(path, resource, options.quiet) File "tools/download_mnist.py", line 42, in download urlretrieve(url, destination_path, reporthook=hook) File "/opt/conda/lib/python3.6/urllib/request.py", line 277, in urlretrieve block = fp.read(bs) File "/opt/conda/lib/python3.6/http/client.py", line 463, in read n = self.readinto(b) File "/opt/conda/lib/python3.6/http/client.py", line 507, in readinto n = self.fp.readinto(b) File "/opt/conda/lib/python3.6/socket.py", line 586, in readinto return self._sock.recv_into(b) ConnectionResetError: [Errno 104] Connection reset by peer ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61789 Reviewed By: dreiss Differential Revision: D29745459 Pulled By: zhouzhuojie fbshipit-source-id: 2deb668bd74478f32bd01704d4362e8a4d95087b	2021-07-16 17:02:13 -07:00
Zeina Migeed	4e2fe9718d	flatten operation (resnet50) (#61265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61265 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29626383 Pulled By: migeed-z fbshipit-source-id: 107769fc14f1fad295a93a10e84235f25ae17357	2021-07-16 16:06:10 -07:00
Tianyi Yu	4479aa8838	Remove all the code that constructs metadata.pkl file (#61760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61760 Remove all code that related to metadata.pkl creation including creating metadata.pkl, converting data from extra/mobile_info.json and extra/producer_info.json to metadata.pkl file. Test Plan: ## Run buck commands: - `cd` into `fbcode` then `buck build //caffe2/caffe2/fb/init:init` - `cd` into `fbcode` then `buck build //caffe2/torch/fb/init:init` - `buck build //xplat/caffe2:torch_mobile_core` ## Export a PyTorch lite/mobile model - Run: `flow-cli canary users.xcheng16.pytorch_trainer.TestWorkflow --run-as-secure-group ai_mobile_platform --buck-target //fblearner/flow/projects/users/xcheng16:workflow` under `fbcode` on devserver. - Resulted Model: metadata.pkl no longer exist {F632063134} Reviewed By: guangy10 Differential Revision: D29702943 fbshipit-source-id: ec7964f4aa3a8e09ccc20b1a7e2232f85931dd81	2021-07-16 15:39:07 -07:00
Elton Leander Pinto	7ac8054d5a	Use better defaults in the clang-tidy wrapper script (#61651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61651 This PR sets some QOL defaults to the clang-tidy wrapper script and refactors how defaults are set. - Runs in parallel - Custom executable (prints an error message to users asking them to install our custom build) - `generate_build_files` can now be run as a script Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D29743661 Pulled By: 1ntEgr8 fbshipit-source-id: 256617d006a03e4ab96091593f5bb80c9b31a2d1	2021-07-16 14:58:19 -07:00
Thomas J. Fan	dc0d1612e1	ENH Updates docs and tests for activation modules for no-batch dims (#61300 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR updates docs and tests for activation modules that already support no-batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61300 Reviewed By: heitorschueroff Differential Revision: D29660543 Pulled By: jbschlosser fbshipit-source-id: 5edad45f7e9995aca6c3403469668e6e1cbb94b6	2021-07-16 14:42:18 -07:00
Brian Hirsh	6a085648d8	add aten symbols for amin and amax (#61550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61550 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29668123 Pulled By: bdhirsh fbshipit-source-id: b111e1c6c6d2beddb220cad70d95954756a3ee9d	2021-07-16 14:06:00 -07:00
Nikita Shulga	4e94e84f65	Type annotate `torch.nn.Module` ctor (#61334 ) Summary: Annotate generic types Fix some type violations Override `_modules` and `_parameters` in `Sequential`, `ModuleList`, `ModuleDict`, etc Fixes https://github.com/pytorch/pytorch/issues/45497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61334 Reviewed By: albanD Differential Revision: D29579533 Pulled By: malfet fbshipit-source-id: 5cd8ca918b260ca35cfdd873dee8851d39d17de2	2021-07-16 13:59:06 -07:00
Nikita Shulga	ee2f2ec9a5	Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test Test Plan: revert-hammer Differential Revision: D29687143 (`5798a00aa4`) Original commit changeset: 9ba9e57f7f85 fbshipit-source-id: 6a672c76a04366b35c492698ae5b39fd4dd1785f	2021-07-16 13:32:51 -07:00
zhouzhuojie	a07d3dc34c	Pin macos mkl conda version to fix the cmake build (#61773 ) Summary: Fixes macos build error in master, recently mkl had a upgrade. CircleCI error: https://app.circleci.com/pipelines/github/pytorch/pytorch/351645/workflows/d22421c1-bb8f-48fd-9efd-7c0d77f0b083/jobs/14815607 ``` Jul 16 11:43:05 CMake Error at /Users/distiller/workspace/miniconda3/lib/cmake/mkl/MKLConfig.cmake:456 (list): Jul 16 11:43:05 list does not recognize sub-command PREPEND Jul 16 11:43:05 Call Stack (most recent call first): Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/mkl.cmake:1 (find_package) Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:109 (include) Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) Jul 16 11:43:05 CMakeLists.txt:5 (find_package) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61773 Reviewed By: soulitzer Differential Revision: D29736742 Pulled By: zhouzhuojie fbshipit-source-id: 68c5244196f7f7562a6c202157c4ccdcfcb64337	2021-07-16 13:15:04 -07:00
Philip Meier	8ad584823f	add shortcircuit in isclose for zero tolerances (#61529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61412. Large integers gave false positives, because the comparison always takes place in floating point dtypes. This happens, because their integer precision is lower than the range of an integer dtype with the same number of bits. For non-extremal values, `isclose` is defined by [this equation]: ```python abs(a - b) <= atol + rtol * abs(b) ``` For `rtol == 0 and atol==0`, this is equivalent to `a == b`. This PR goes for the low hanging fruit and adds a shortcut for this case that falls back to an actual equality check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61529 Reviewed By: gchanan Differential Revision: D29707534 Pulled By: mruberry fbshipit-source-id: 71b8c4901e9cd4f366442437e52032b0d3002b4a	2021-07-16 12:48:16 -07:00
Nikita Shulga	612632556d	Fix `torch.median` crash on empty tensor (#61698 ) Summary: `torch.tensor([]).median()` returns `nan`, which mimics the behavior of `np.median` Add test to `TestReductions.test_median_corner_cases` Fixes https://github.com/pytorch/pytorch/issues/61656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61698 Reviewed By: heitorschueroff Differential Revision: D29706912 Pulled By: malfet fbshipit-source-id: ea5f58327fbff371f3fb8786b269430c7a10d05f	2021-07-16 12:36:18 -07:00
Jane Xu	3fd9dcf934	Move non-libtorch scheduled linux CI to GHA (#61732 ) Summary: Move non-libtorch Linux 11.3 scheduled CI job to GHA. Libtorch builds will be migrated here: https://github.com/pytorch/pytorch/pull/61774 Successful run: https://github.com/pytorch/pytorch/actions/runs/1035592487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61732 Reviewed By: seemethere Differential Revision: D29735637 Pulled By: janeyx99 fbshipit-source-id: dce13370b218ae7833483fdaa00137db95e27c98	2021-07-16 12:16:58 -07:00
Anjali Chourdia	287603f51c	Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation Test Plan: revert-hammer Differential Revision: D29698486 (`328606699f`) Original commit changeset: 5af2d3803ab1 fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903	2021-07-16 11:02:09 -07:00
Amy He	5798a00aa4	[3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test (#61594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594 ### Summary: Added a unit test for the Nnapi delegate's preprocess() function. The function was previously tested locally, but now a basic test is added for OSS. See https://github.com/pytorch/pytorch/pull/61499 for preprocess implementation. See D29647123 for local testing. TODO: Add more comprehensive tests. Add tests for model execution, after the Nnapi delegate's initialization and execution is implemented T91991928. CMakeLists.txt: Added a library for the Nnapi delegate - Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind test_backends.py: Added a test for lowering to Nnapi - Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py - Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32 ### Test Plan: Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29687143 fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802	2021-07-16 11:00:38 -07:00
Richard Barnes	349f2f767c	Modernize to default constructor and nullptr in torch (#61735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61735 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716659 fbshipit-source-id: ec2a0a0b7e55d2e50b1d35f0b651bd40675ae7e8	2021-07-16 10:51:13 -07:00
Philip Meier	736bb26746	use `rand` over `empty` in flaky test (#61710 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/61694#issuecomment-880641635. cc krshrimali. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61710 Reviewed By: anjali411 Differential Revision: D29719660 Pulled By: mruberry fbshipit-source-id: 589574a039ad431acc7d095d452f0b3e52260208	2021-07-16 10:50:05 -07:00
Raghavan Raman	efeacc0779	[Static Runtime] Fixed visibility of ProcessedNode class and a newly added function (#61729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61729 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29719644 Pulled By: navahgar fbshipit-source-id: 27a77b2a281d1a8a48e2a9df1c254f62c0e2e7ef	2021-07-16 10:42:02 -07:00
Will Constable	6fa80f7f9f	Refactor embedded_interpreter registration to be friendly to python case (#59991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59991 add a registration mechanism whereby on loading the embedded interpreter library, a registration function is called which links up the symbols it provides with torch::deploy. Test Plan: local and CI deploy tests pass Reviewed By: suo Differential Revision: D28764436 fbshipit-source-id: 88416bd098be306f887cc9fd2d65d29199439bc4	2021-07-16 10:33:58 -07:00
Amy He	6349bde572	[4/N] Nnapi backend delegation preprocess: List Tensors & Comment Updates (#61752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61752 Updated Android NNAPI preprocess, so that it can accept both single Tensor inputs and Tensor List inputs. - The inputs are not real data, but input parameters for shape, dtype, quantization, and dimorder that are bundled as a Tensor. Comments were updated to make this clearer. - In the future, preprocess will also accept a dedicated NnapiArg object. Compile_spec should have the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List< at::Tensor >}} Example input Tensor: torch.tensor([[1.0, -1.0, 2.0, -2.0]]).unsqueeze(-1).unsqueeze(-1) ### Testing OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948 TODO: Add OSS tests for single Tensor and Tensor List inputs. ghstack-source-id: 133683735 Test Plan: OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948. TODO: Add OSS tests for single Tensor and Tensor List inputs. Reviewed By: iseeyuan Differential Revision: D29726432 fbshipit-source-id: 08de70578f37681bda365f9776a1c96030257e7a	2021-07-16 10:17:56 -07:00
Kurt Mohler	328606699f	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: zou3519 Differential Revision: D29698486 Pulled By: albanD fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958	2021-07-16 09:18:34 -07:00
Mike Iovine	28150fd0c8	[static_runtime] Implement aten::linear (#61595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595 Add out variant wrapper for `aten::linear` in the static runtime Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29684236 fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df	2021-07-16 08:55:43 -07:00
Jeffrey Wan	3624d75864	Revert D29703523: [pytorch][PR] Fix scribe logs Test Plan: revert-hammer Differential Revision: D29703523 (`eb5a56fb74`) Original commit changeset: 829ad3630d35 fbshipit-source-id: 2b2196d58791b995a008b6d810b3248ed27e7d94	2021-07-16 08:50:13 -07:00
Bert Maher	b963607d50	[nnc] Insert alloc/free at global scope (#61725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61725 Alloc/free inside a loop isn't really an optimization, and furthermore it breaks some attempted optimization in the llvm backend: we use alloca for small allocations, which is efficient since alloca is on the stack, but there's no corresponding free, so we leak tons of stack. I hit this while building an rfactor buffer inside a very deeply nested loop. ghstack-source-id: 133627310 Test Plan: Unit test which simulates use of a temp buffer in a deeply nested loop. Reviewed By: navahgar Differential Revision: D29533364 fbshipit-source-id: c321f4cb05304cfb9146afe32edc4567b623412e	2021-07-16 08:42:24 -07:00
Rohan Varma	4c3d9cfe03	[BE] Fix flaky test_ddp_model_diff_across_ranks test (#61546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61546 Closes https://github.com/pytorch/pytorch/issues/60661 Fixes this flaky test by using blocking wait instead of async error handling, and performs a gloo-based barrier with higher timeout at the end of test which avoids issues with Barrier.sync. This also allows us to remove this test from the `skip_return_code_checks` list. ghstack-source-id: 133657107 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29663884 fbshipit-source-id: 9f0df085b1968f6a7e2c7ae2f06b6dcd4838a87e	2021-07-16 08:37:02 -07:00
Rohan Varma	f1114364ad	[DDP] Enhance comm hook docs (#61677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61677 Specify return type more clearly, 2) Misc fixes ghstack-source-id: 133657895 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29701384 fbshipit-source-id: 7f77b99065bd2977153f397745e07b75bbdd7a94	2021-07-16 08:35:49 -07:00
Tianyi Yu	39ce29efe0	Refactor metadata_map with flattened key/value pair (#61731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61731 In the previous diff, metadata_map contains mobile_info.json and producer_info.json. We need to parse json each time when we log the required information. This diff helps to flatten the content in the files into key/value pair. It allows logger to directly loop through the metadata_map and log the information. Test Plan: Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner. # Test On CC Scanner Test content with LOG(WARNING) {P429123273} Scuba Logger Output 1. MOBILE_MODULE_LOAD_STATS {F631884673} 2. MOBILE_MODULE_STATS {F631884787} Reviewed By: xcheng16 Differential Revision: D29690702 fbshipit-source-id: 1db5a1f5c25e98e5b2f1cc254fd880dfdfa025e2	2021-07-16 00:37:17 -07:00
Tianyi Yu	00a7f55b6e	Apply for MOBILE_MODULE_STATS Logging (#61600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61600 This diff changes the module.h constructor, and removes metadata_. It refactors all the constructors caller side, and creates a getter & setting for metadata_. MOBILE_MODULE_STATS reads the metadata from mobile::Module, and pass it into logger. Test Plan: Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner. # Test On CC Scanner Test content with LOG(WARNING) {P428930572} Scuba Logger Output {F631761194} Reviewed By: xcheng16 Differential Revision: D29673184 fbshipit-source-id: 962e0d7b06a07caaa0c695a4ac58b885fd1505ea	2021-07-16 00:37:15 -07:00
Tianyi Yu	fc710eecc0	Apply for MOBILE_MODULE_LOAD_STATS Logging (#61480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61480 Append mobile_info.json and producer_info.json into extra_files and parse the jsons from “model_info.json” in onExitLoadModel. ghstack-source-id: 133327912 Test Plan: # Test On CC Scanner Test content with LOG(WARNING) {P428339274} Scuba Logger Output {F631024095} # Test On 3D Photo Test content with LOG(WARNING) {P428340927} Scuba Logger Output {F631026739} Reviewed By: xcheng16, guangy10 Differential Revision: D29608014 fbshipit-source-id: abc39c44b947632fd4349de8a432649e84284a87	2021-07-16 00:36:09 -07:00
Rohan Varma	56d562e790	[DDP] fix test_ddp_inference (#61666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61666 Closes https://github.com/pytorch/pytorch/issues/61481. Fixes this test by removing section that uses only torch.no_grad() and doesn't call model.eval(). For SyncBN, need to call model.eval() otherwise SyncBN will assume it is in training mode, which does collective calls in the forward pass and does not work for inference. ghstack-source-id: 133657549 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29699444 fbshipit-source-id: 03ccb296dd9cb56729cd23e91c7f50b72fcf3adf	2021-07-16 00:25:02 -07:00
Kushashwa Ravi Shrimali	7e1f01d4c0	Alias for `polygamma` (#59691 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59691 Reviewed By: gchanan Differential Revision: D29707514 Pulled By: mruberry fbshipit-source-id: 40c15e1fda3d9f7013977b0f36a77b228dda6aa5	2021-07-16 00:06:27 -07:00
Kushashwa Ravi Shrimali	f008e8d32d	Remove test_out, test_variant_consistency_eager skips for `addmv`; fixed before (#61579 ) Summary: This PR: 1. Removes `test_out` skip: it's not needed anymore after it was fixed in https://github.com/pytorch/pytorch/pull/55746. This should also close https://github.com/pytorch/pytorch/issues/55589. 2. Removes `test_variant_consistency_eager` skip, it was added by mistake in https://github.com/pytorch/pytorch/issues/55771. 3. Refines `sample_inputs_addmv` function, the updated function should now be cleaner and easy to read. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61579 Reviewed By: gchanan Differential Revision: D29709674 Pulled By: mruberry fbshipit-source-id: 9b975c024777efdd33c6b9444b0b36e0eab85c03	2021-07-15 22:35:03 -07:00
Raghavan Raman	843c42ffd8	[nnc] Refactored test macros and updated compress buffer tests to use them (#61716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715754 Pulled By: navahgar fbshipit-source-id: c400a58b7f393c0f93e5a25f118403124f8834b0	2021-07-15 21:17:14 -07:00
Raghavan Raman	d01837081d	[nnc] Cleaned up compress buffer tests to use BufHandle instead of Buf (#61715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61715 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715755 Pulled By: navahgar fbshipit-source-id: 453adac8f5b13263c39d96b6b4086425a01bae54	2021-07-15 21:15:23 -07:00
zhouzhuojie	eb5a56fb74	Fix scribe logs (#61675 ) Summary: Related to https://github.com/pytorch/pytorch/issues/61632 This PR adds - refactoring of scribe related code to scribe.py - changed the `render_test_results` job to always use the `linux.2xlarge` runner - if SCRIBE_GRAPHQL_ACCESS_TOKEN is empty, try boto3 instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/61675 Reviewed By: seemethere Differential Revision: D29703523 Pulled By: zhouzhuojie fbshipit-source-id: 829ad3630d3500a498b41aa458ce6539aaeae938	2021-07-15 19:27:58 -07:00
Richard Barnes	127562a0ed	Fix some sign comparisons (#61618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61618 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29688193 fbshipit-source-id: ea7a6b6be8b25d4a0668e744688f96bbbb144dc7	2021-07-15 18:28:41 -07:00
Richard Barnes	e6860ba508	Fix some sign comparisons and a loop (#61663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61663 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29696766 fbshipit-source-id: eb5a77bd0cfafeb6209d274f121f10dca20d461a	2021-07-15 18:27:42 -07:00
Nikita Shulga	9d955abcdb	Fix test_reductions when no SciPy is installed (#61699 ) Summary: Also, use skipIfNoSciPy decorator instead of implicit `unittest.skipIf` This fixes regression introduced by https://github.com/pytorch/pytorch/pull/52565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61699 Reviewed By: seemethere Differential Revision: D29706938 Pulled By: malfet fbshipit-source-id: 0b63c3ddadfa7f68bed994b71cadf68976d3b396	2021-07-15 15:57:11 -07:00
kshitij12345	968a01a94a	[special] migrate xlogy (#60641 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60641 Reviewed By: gchanan Differential Revision: D29709306 Pulled By: mruberry fbshipit-source-id: e8a5f64009a895a25618637de40b55cf36b8f794	2021-07-15 15:32:09 -07:00
Mike Ruberry	1ce3281a6d	Revert D29361872: [pytorch][PR] det_backward: more robust and with complex support Test Plan: revert-hammer Differential Revision: D29361872 (`fce85480b9`) Original commit changeset: b1f0fec7e3ac fbshipit-source-id: feffa74ad65b0b294e0a9b0ee72d245393421f70	2021-07-15 15:26:00 -07:00
Sam Estep	3a0801f960	[skip ci] Fix "arugment" typos (#61459 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61455. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459 Reviewed By: soulitzer Differential Revision: D29636559 Pulled By: samestep fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a	2021-07-15 15:20:18 -07:00
Eli Uriegas	e5fcc903d6	torch: Make __version__ better with comparisons (#61556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61556 Prior to 1.10.0 `torch.__version__` was stored as a str and so many did comparisons against `torch.__version__` as if it were a str. In order to not break them we have TorchVersion which masquerades as a str while also having the ability to compare against both packaging.version.Version as well as tuples of values, eg. (1, 2, 1) Examples: Comparing a TorchVersion object to a Version object ``` TorchVersion('1.10.0a') > Version('1.10.0a') ``` Comparing a TorchVersion object to a Tuple object ``` TorchVersion('1.10.0a') > (1, 2) # 1.2 TorchVersion('1.10.0a') > (1, 2, 1) # 1.2.1 ``` Comparing a TorchVersion object against a string ``` TorchVersion('1.10.0a') > '1.2' TorchVersion('1.10.0a') > '1.2.1' ``` Resolves https://github.com/pytorch/pytorch/issues/61540 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29671234 Pulled By: seemethere fbshipit-source-id: 6044805918723b4aca60bbec4b5aafc1189eaad7	2021-07-15 15:12:09 -07:00
Kushashwa Ravi Shrimali	0ea29a6ccb	Analysing time taken by gradgrad checks for Spectral Functions (#60435 ) Summary: Description: `SpectralFuncInfo` defines decorator mentioning: "gradgrad is quite slow". This PR re-analyses that statement since things have changed with gradient tests. Test times: https://github.com/pytorch/pytorch/pull/60435#issuecomment-865658177 Follow-up of https://github.com/pytorch/pytorch/pull/57802 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/60435 Reviewed By: gchanan Differential Revision: D29707444 Pulled By: mruberry fbshipit-source-id: 444b4863bac8556c7e8fcc8ff58d81a91bd96a21	2021-07-15 14:02:03 -07:00
Kushashwa Ravi Shrimali	4ff121f58d	Add `complex64` dtype for OpInfo Reference testing (#61627 ) Summary: This PR adds `complex64` dtype testing, following conversation from: pytorch/xla#3019 ([comment](https://github.com/pytorch/xla/pull/3019#discussion_r666754943)). Original PR that added OpInfo reference testing: https://github.com/pytorch/pytorch/pull/59369. cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61627 Reviewed By: gchanan Differential Revision: D29710560 Pulled By: mruberry fbshipit-source-id: 55b2e5ff47f031069335a0c75a45d4f4885ef9ac	2021-07-15 13:40:37 -07:00
Nikita Shulga	e2c3049e2a	Delete stable-sort-only-works-on-cpu warning (#61685 ) Summary: stable GPU sorting is implemented by https://github.com/pytorch/pytorch/pull/56821 Fixes https://github.com/pytorch/pytorch/issues/61682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61685 Reviewed By: gchanan Differential Revision: D29704864 Pulled By: malfet fbshipit-source-id: 3a5aa24bf6507be63844fe6016fb9e3c682f4d84	2021-07-15 13:34:41 -07:00
Bo Wang	e098e9000b	Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507 Benchmark Python-only DDP vs production C++ based DistributedDataParallel. - Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction - Added compare_ddp to measure the difference in forward and backward step Kudos on Shen and Yi for the great idea. Test Plan: Test on DevGPUS with 2 CUDA devices. $python compare_ddp.py Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance. This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details. Imported from OSS Differential Revision: D29685364 D29685364 Reviewed By: mrshenli Pulled By: bowangbj fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a	2021-07-15 12:52:22 -07:00
Akshit Khurana	7a3b05ea6d	Fix hardswish inplace op for strided tensor with skipped elements (#61622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61622 Hardswish inplace op would return incorrect response for strided tensor inputs that skip elements like a slice. Create a contiguous tensor and copy elements back to return the correct answer Test Plan: Internal CI tests Reviewed By: kimishpatel Differential Revision: D29689745 fbshipit-source-id: 11618a8d865f550f6b70637345f9ebc3e5676f11	2021-07-15 11:50:27 -07:00
Nikita Vedeneev	fce85480b9	det_backward: more robust and with complex support (#58195 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58195 Reviewed By: albanD Differential Revision: D29361872 Pulled By: anjali411 fbshipit-source-id: b1f0fec7e3ac52acd1481bcc878cc0c1d07c1852	2021-07-15 11:04:42 -07:00
Raghavan Raman	bd360ebe6f	[nnc] Added a new API to distribute loop and all its parents (#61293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61293 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29560008 Pulled By: navahgar fbshipit-source-id: e4e459184f20b1872bc242ba8626d0a6df29e810	2021-07-15 10:28:20 -07:00
Raghavan Raman	76f097466e	[nnc] Added a new API to compress all buffers in a given statement (#61087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61087 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506677 Pulled By: navahgar fbshipit-source-id: 63583fd5a0e42c0096ddf08d5b96bc680ea8a44e	2021-07-15 10:28:18 -07:00
Raghavan Raman	2908d3eb45	[nnc] Modified the semantics of reorder in using permutation (#61085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61085 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506679 Pulled By: navahgar fbshipit-source-id: f674aedff8175b9947404fd2164a0b4f57a71e93	2021-07-15 10:28:16 -07:00
Rohan Varma	7177509380	Revert [DDP] Support not all outputs used in loss calculation (#61497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61497 Reverts [DDP] Support not all outputs used in loss calculation ghstack-source-id: 133589153 Test Plan: CI, ping authors to run their workflow on this diff Reviewed By: zhaojuanmao Differential Revision: D29642892 fbshipit-source-id: 81a15b9ab3329602f34d3758bb0799005a053d4f	2021-07-15 10:28:14 -07:00
Rohan Varma	25f9c35dd7	Revert [DDP] Support for multiple backwards (#61401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61401 Reverts https://github.com/pytorch/pytorch/pull/59359, which is causing a few internal issues in DDP training. We will evaluate the internal use cases and reland it after reconsidering the design. Also moves `prepare_for_backward` back into forward pass instead of DDP Sink for `find_unused_parameters`. This ensures that hooks will always fire in the backwards pass, which is behavior that internal training workloads rely on. Calling `prepare_for_backward` in DDPSink autograd function is not the best solution since other autograd threads may have been executing which can cause races. ghstack-source-id: 133589152 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29608948 fbshipit-source-id: f060f41cd103573ddff8da50cdbb6c56768dab46	2021-07-15 10:28:13 -07:00
Rohan Varma	38ac9e69aa	Back out "[DDP] Disable reducer hooks from running outside of DDP backwards." (#61399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61399 Reverts https://github.com/pytorch/pytorch/pull/60921 Original commit changeset: fef76a0dd295 ghstack-source-id: 133581300 Test Plan: CI Differential Revision: D29594262 fbshipit-source-id: a308d3f10dbbb2169d9a7f60f2f28f139185ed1f	2021-07-15 10:27:02 -07:00
Yu Guo	a50a389ca6	Revert D29701479: [pytorch][PR] Remove `_broadcast_object()` from `ZeroRedundancyOptimizer` Test Plan: revert-hammer Differential Revision: D29701479 (`9b5d9b4049`) Original commit changeset: c8d5f9057b32 fbshipit-source-id: 35ab1f399513fb9d1c4e73b1fa906e559d2a6994	2021-07-15 10:03:08 -07:00
Joel Schlosser	aa01a7a61c	Fix for get_buffer(): check buffers by name instead of value (#61429 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61242 Previous code was wrongly checking if a tensor is a buffer in a module by comparing values; fix compares names instead. Docs need some updating as well- current plan is to bump that to a separate PR, but I'm happy to do it here as well if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61429 Reviewed By: gchanan Differential Revision: D29712341 Pulled By: jbschlosser fbshipit-source-id: 41f29ab746505e60f13de42a9053a6770a3aac22	2021-07-15 09:55:09 -07:00
Peter Bell	5407108533	CopyBackward: Remove redundant src_device and unnecessary copy=True (#60025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60025 `to` already copies unconditionally if `src.device() != options.device()` so specifying the copy argument is unnecessary. `src.device()` is also completely equivalent to `src.options().device()` so storing both is redundant. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29698627 Pulled By: albanD fbshipit-source-id: eb091d39b71db688e6bcbb33a227c01b94b432bb	2021-07-15 09:48:03 -07:00
Eli Uriegas	da667e2d5f	Add .github for CODEOWNERS (#61598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61598 I'd like to be notified on changes to the github actions workflows, add this so I can be notified. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99, samestep Differential Revision: D29685783 Pulled By: seemethere fbshipit-source-id: 865a1360a24633ef5074e43b8277838a0eef94f6	2021-07-15 09:39:12 -07:00
Patrick	8afb65b6c5	changed launch bounds for upsample_linear1d fwd, bwd from 1024 to 512 (#61307 ) Summary: Changed launch bounds for upsample_linear1d_out_frame and upsample_linear1d_backward_out_frame from 1024 to 512. Shows performance improvement as shown below. Does not completely eliminate lmem usage (lmem usage goes from 40-48 bytes to 8-16 bytes), not sure why. Timing data (using Nvidia Titan-V GPU): ![UpsampleLinear1dTimingData](https://user-images.githubusercontent.com/22803332/124677708-e20d6280-de75-11eb-8187-fb50ec89dc50.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61307 Reviewed By: heitorschueroff Differential Revision: D29662137 Pulled By: ngimel fbshipit-source-id: 9653672ee17f25b75a02f295f388a78327091431	2021-07-15 09:19:16 -07:00
Victor Quach	ee5a97de11	Register Saved Tensors hooks (#60663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60663 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29466223 fbshipit-source-id: 65dc3a935c18a0e6b93a37e24543c696e6ae0321	2021-07-15 08:09:55 -07:00
Don Jang	94965212e5	[static runtime] Use at::allclose to test NNC sigmoid (#61566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61566 This change uses `at::allclose` to compare results from sigmoid functions (CPU/NNC) instead of `Tensor::equals` due to numerical errors occurring between them. Test Plan: I confirmed that the flakiness of `StaticRuntime.Sigmoid` is gone with this change: ``` [djang@devvm1999.ftw0 ~/fbsource/fbcode] buck-out/gen/caffe2/benchmarks/static_runtime/static_runtime_cpptest -v 3 --gtest_filter=StaticRuntime.Sigmoid --gtest_repeat=100 &> output.txt [djang@devvm1999.ftw0 ~/fbsource/fbcode] grep PASSED output.txt \| wc 100 500 2100 ``` Reviewed By: bertmaher Differential Revision: D29671203 fbshipit-source-id: 99a7b16d18ea047c9aad444f36d8368f9d0b088d	2021-07-14 19:48:00 -07:00
Andrew Gu	9b5d9b4049	Remove `_broadcast_object()` from `ZeroRedundancyOptimizer` (#61539 ) Summary: Revised version of https://github.com/pytorch/pytorch/issues/60573. Overview: This makes two changes: - It introduces a `map_location` argument to `broadcast_object_list()`. The argument specifies the device to load tensors contained in objects received from the broadcast. This change requires modifying the implementation of `_object_to_tensor()` and `_tensor_to_object()` to use `torch.save()` and torch.load()` respectively. - It removes all calls to `_broadcast_object()` in `ZeroRedundancyOptimizer` and the corresponding test file in favor of `broadcast_object_list()`. The default value of `map_location` is `None`, in which case `_object_to_tensor()` and hence `broadcast_object_list()` preserve their original behavior. Namely, contained tensors are loaded to their original device. In `consolidate_state_dict()`, I specify `map_location=torch.device("cpu")` instead of `self._default_device`. This slightly changes the behavior from before when using `_broadcast_object()`. The reason I do so is that it saves one GPU to CPU data transfer since the action immediately after receiving the broadcasted `local_state_dict` is to copy it to CPU. Explicitly, if `map_location=self._default_device`, then the data transfer path assuming NCCL backend is as follows: `source GPU --[before serialize]--> source CPU --[before broadcast]--> source GPU --[broadcast]--> destination GPU --[before deserialize]--> destination CPU --[deserialize]--> destination GPU --[copy]--> destination CPU` Hence, by setting `map_location=torch.device("cpu")` instead, the suffix becomes: `destination CPU --[deserialize]--> destination CPU --[copy]--> destination CPU` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61539 Test Plan: I added a test `test_broadcast_object_list_map_location()` that checks for both `map_location` as CPU and GPU that (1) tensors contained in broadcasted objects are appropriately loaded onto the specified device and (2) that the contents of the tensors are correct. The existing `ZeroRedundancyOptimizer` tests pass. ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` The existing `broadcast_object_list()` test passes: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_broadcast_object_list ``` Reviewed By: zou3519 Differential Revision: D29701479 Pulled By: andwgu fbshipit-source-id: c8d5f9057b32e5e9f40e8edc5b2cc25fb21414a9	2021-07-14 17:36:30 -07:00
Valentin Andrei	e3d5619ff0	[pytorch][profiler] Fix division by 0 in computeFlops (#61676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61676 Reviewed By: ilia-cher Differential Revision: D29646067 fbshipit-source-id: d872221bbde5384a9e397e68c1e5b0664d913b42	2021-07-14 16:38:19 -07:00
Dimitrije Jankov	70e94bb1dd	Avoid redefining `__BYTE_ORDER` (#60346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60346 Introduction: In order to support the Intel SGX platform, we have to avoid redefining __BYTE_ORDER. Solution: Check if the platform is SGX and avoid the redefinition. Test Plan: Run the PyTorch tests. Reviewed By: h397wang, malfet Differential Revision: D29022626 fbshipit-source-id: 801c3a75c202d192a3808eb5d54b875094499996	2021-07-14 14:55:04 -07:00
Jinay Dagli	a9c3580080	Grammatical update of tech docs (#61547 ) Summary: Added some minor grammatical updates to the 'Complex Numbers' docs. ![Screenshot (180)](https://user-images.githubusercontent.com/75036632/125342884-0b952500-e373-11eb-9e63-410ff31e6c21.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61547 Reviewed By: zou3519 Differential Revision: D29677361 Pulled By: H-Huang fbshipit-source-id: 78222310a755911192905a8f52aa0ae325900006	2021-07-14 14:01:59 -07:00
Garrett Cramer	5a5c7f563d	add trainer hook functions (#60785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785 This pr adds hook functions for the trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697299 Pulled By: gcramer23 fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f	2021-07-14 13:19:17 -07:00
Garrett Cramer	304c02ee44	refactor ps benchmark (#60784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784 This pr refactors the ps benchmark for modular trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697291 Pulled By: gcramer23 fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d	2021-07-14 13:19:13 -07:00
Pritam Damania	7d2ea9a8f7	Release GIL as much as possible in dist_autograd pybind. (#61593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61593 Following the pattern in https://github.com/pytorch/pytorch/pull/61588 to avoid deadlocks as much as possible. ghstack-source-id: 133497897 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D29683451 fbshipit-source-id: 1951622eb964f57a551a9c0d46ad0ab24b66c458	2021-07-14 13:19:10 -07:00
Pritam Damania	5ebc7c9f97	Avoid holding GIL while calling retrieveContext. (#61588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61588 As part of debugging https://github.com/pytorch/pytorch/issues/60290, we discovered the following deadlock: ``` Thread 79 (Thread 0x7f52ff7fe700 (LWP 205437)): #0 pthread_cond_timedwait@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x0000564880199152 in PyCOND_TIMEDWAIT (cond=0x564880346080 <gil_cond>, mut=0x564880346100 <gil_mutex>, us=5000) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/condvar.h:103 #2 take_gil (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval_gil.h:224 #3 0x0000564880217b62 in PyEval_AcquireThread (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:278 #4 0x00007f557d54aabd in pybind11::gil_scoped_acquire::gil_scoped_acquire() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #5 0x00007f557da7792f in (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const, _object) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #6 0x00007f5560dadba6 in c10::TensorImpl::release_resources() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so #7 0x00007f5574c885bc in std::_Sp_counted_ptr_inplace<torch::distributed::autograd::DistAutogradContext, std::allocator<torch::distributed::autograd::DistAutogradContext>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #8 0x00007f5574c815e9 in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false>) [clone .isra.325] () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #9 0x00007f5574c81bf1 in torch::distributed::autograd::DistAutogradContainer::eraseContextIdAndReset(torch::distributed::autograd::DistAutogradContainer::ContextsShard&, long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #10 0x00007f5574c86e83 in torch::distributed::autograd::DistAutogradContainer::releaseContextIfPresent(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #11 0x00007f5574cc6395 in torch::distributed::rpc::RequestCallbackNoPython::processCleanupAutogradContextReq(torch::distributed::rpc::RpcCommandBase&) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #12 0x00007f5574cccf15 in torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so Thread 72 (Thread 0x7f53077fe700 (LWP 205412)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f55bc62adbd in __GI___pthread_mutex_lock (mutex=0x564884396440) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f5574c82a2f in torch::distributed::autograd::DistAutogradContainer::retrieveContext(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #3 0x00007f557de9bb2f in pybind11::cpp_function::initialize<torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object, _object)::{lambda(long)#11}, pybind11::dict, long, pybind11::name, pybind11::scope, pybind11::sibling, char [931], pybind11::arg>(torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object, _object)::{lambda(long)#11}&&, pybind11::dict ()(long), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [931], pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so ``` Basically Thread 72, holds GIL and tries to acquire the lock for DistAutogradContainer to perform a lookup on a map. On the other hand, Thread 79 holds the lock on DistAutogradContainer to remove a Tensor and as part of TensorImpl destructor, concrete_decref_fn is called which waits for GIL. As a result, we have a deadlock. To fix this issue, I've ensured we release GIL when we call `retrieveContext` and acquire it later when needed. ghstack-source-id: 133493659 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D29682624 fbshipit-source-id: f68a1fb39040ca0447a26e456a97bce64af6b79c	2021-07-14 13:17:16 -07:00
Stephen Jia	f2adbff36e	[Metal] Do not use read/write textures in concat shaders (#61074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61074 `read_write` textures are not available on some devices, such as iPhone 7. This prevents the concat op from functioning on those devices. This diff rewrites the concat shaders such that they do not depend on `read_write` textures. Test Plan: Test on device: run squeezenet and/or the operator tests ``` arc focus2 pp-ios ``` Test on Mac ``` buck test pp-macos ``` Test specifically on iPhone7, either device or simulator. Reviewed By: xta0 Differential Revision: D29501656 fbshipit-source-id: de4a059953ab4b0abf38b6ecb3f665323dcdbea1	2021-07-14 13:03:48 -07:00
Nikita Shulga	80bdfd64c5	Skip Bfloat16 support when building for VSX (#61630 ) Summary: Copy-paste ifdef guard from vec256/vec256.h Probably fixes https://github.com/pytorch/pytorch/issues/61575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61630 Reviewed By: janeyx99 Differential Revision: D29690676 Pulled By: malfet fbshipit-source-id: f6d91eadab74bcbcb1dc9854ae1b98a0dccacd14	2021-07-14 13:02:29 -07:00
Mikhail Zolotukhin	43a2f7c26a	[TensorExpr] Do not fuse float16 values. (#61569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29672564 Pulled By: ZolotukhinM fbshipit-source-id: fe64ec38209d43f8246bcb6c397b64a28cbd86fa	2021-07-14 12:53:59 -07:00
Bo Wang	ab27399566	Make broadcast_object_list accept a device parameter. (#61305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61305 Part I (this PR): Add dist_device argument to broadcast_object_list API Part II: andwgu@ will deprecate _broadcast_object with the newly introduced API Also include the changes to _object_to_tensor()/_tensor_to_object() with PR 60573 Context: https://github.com/pytorch/pytorch/issues/60062 Test Plan: Run the following on DevGpus with two cuda devices $python setup.py develop --- run this build on DevGPU $BACKEND='nccl' WORLD_SIZE=2 with-proxy python test/distributed/test_distributed_fork.py TestDistBackendWithFork.test_broadcast_object_list --v $BACKEND='gloo' WORLD_SIZE=2 with-proxy python test/distributed/test_distributed_fork.py TestDistBackendWithFork.test_broadcast_object_list --v Build with distributed on: USE_DISTRIBUTE=1 python setup.py develop Test on CPU devvm: $ with-proxy python test/distributed/optim/test_zero_redundancy_optimizer.py Imported from OSS Differential Revision: D29566538 D29566538 Reviewed By: iramazanli, mrshenli Pulled By: bowangbj fbshipit-source-id: 0bea52442551c5194acba85eadda16ba2ec4b6ef	2021-07-14 11:43:17 -07:00
Karen Zhou	9b3cbeaf7d	[pruner] fix activation handles logic (#61592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61592 Add activation handles for each layer (stored in a list), so they can each be removed. We don't remove them in the `convert` in eager mode because we aren't modifying output/input layer dimensions. We will need this in Fx mode though. ghstack-source-id: 133497376 Test Plan: Added some tests to make sure `model(x)` runs without error. `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LBf4 Reviewed By: z-a-f Differential Revision: D29682789 fbshipit-source-id: 9185702736e5f7f4320754ffef441610738ac154	2021-07-14 11:07:23 -07:00
John Shen	343cb276b0	[pytorch] Add broadcasting support to add_relu kernel (#61584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61584 add_relu is not working with broadcasting. This registers a scalar version of add_relu in native_functions that casts to tensor before calling the regular function. TensorIterator handles broadcasting analogously to existing add. ghstack-source-id: 133480068 Test Plan: python3 test/test_nn.py TestAddRelu Reviewed By: kimishpatel Differential Revision: D29641768 fbshipit-source-id: 1b0ecfdb7eaf44afed83c9e9e74160493c048cbc	2021-07-14 10:32:20 -07:00
Jamie King	c23db9327a	Smart Decay for Adam - Caffe2 (#61548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen * we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly. Differential Revision: D29654246 fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e	2021-07-14 10:22:38 -07:00
Kaige Liu	58adaaba60	Enable C2 load rate limiter [2/n] (#61551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551 We aim to enable rate limiter in C2 load, with a fix bandwidth limit. This diff update LoadOp to pass down the manifold db options. Test Plan: ``` buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test ``` Differential Revision: D29639102 fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99	2021-07-14 08:27:05 -07:00
Andrew Gu	57feb35474	Refactor non-joined process computation (#61555 ) Summary: Overview: This refactors the computation on non-joined processes relating to the join context manager. The concept was inspired by a comment from pritamdamania. Changes: This introduces a `_Joinable` abstract base class, which requires a `_join_hook()` method and `_join_device()` and `_join_process_group()` property methods. Any class that we want to be compatible with the generic join context manager should inherit from `_Joinable` and implement `_join_hook()`, `_join_device()`, and `_join_process_group()`. (The `device` and `process_group` information has been moved from `_JoinHook` to `_Joinable`.) The generic join context manager now takes in a `List[_Joinable]` instead of `List[_JoinHook]`. The motivation for this is that previously, by passing the `_JoinHook`s into the context manager, the class providing a `_JoinHook` can modify the context manager's behavior, but the context manager cannot modify the class's behavior. This is solved by giving the context manager a reference to the class's instance. This implementation reserves the field `_join_config` in every `_Joinable` to store a `_JoinConfig` instance, which holds all dynamic fields needed from the `_Joinable` for the join context manager: `enable`, `throw_on_early_termination`, and `is_first_joinable`. ("dynamic" here means that for a given `_Joinable` instance, the values for those fields may change across different join context usages.) In particular, these fields are needed to implement a method `notify_join_context()`, which encapsulates the computation performed on non-joined processes relating to the join context manager --- (1) the all-reduce to indicate that the process has not yet joined and (2) the all-reduce to check whether to throw an exception if `throw_on_uneven_inputs=True`. The idea is that every `_Joinable` class only needs to make a call to `notify_join_context()` before its per-iteration collective communications; it is a simple one-line addition. Only the first `_Joinable` instance passed into the context manager actually performs the collective communications in `notify_join_context()`. In that case, the method returns an async work handle for the initial all-reduce indicating that the process not yet joined. Otherwise, the method returns `None`. This conditional logic is handled internally without additional input from the user. New API: Now, the example usage would look like: ``` ddp_model = DistributedDataParallel(...) zero_optim = ZeroRedundancyOptimizer(ddp_model.parameters(), ...) with _Join([ddp_model, zero_optim]): ... ``` Any arguments meant for a join hook (e.g. `divide_by_initial_world_size`) must be specified as keyword arguments. For example: ``` with _Join([ddp_model, zero_optim], divide_by_initial_world_size=False): ... ``` They will be forwarded to every `_join_hook()` function via `kwargs`. This creates a clear separation between the variables needed by the context manager (`enable` and `throw_on_early_termination`) and those needed by the `_Joinable` class (e.g. `divide_by_initial_world_size`). Recap:** After this change, the relevant information to use the generic join context manager looks like the following (omitting prefix `_` from names): - Suppose we have a class `C` (e.g. `DistributedDataParallel`) that we want to be able to use the `Join` context. - We make `C` inherit from `Joinable` and implement `join_hook() -> JoinHook`, `join_device()`, and `join_process_group()`. - To implement `join_hook()`, we define a `CJoinHook` class inheriting from `JoinHook` and implement `main_hook()` and `post_hook()` as needed. - We locate a place before `C`'s per-iteration collective communications and add a call to `Join.notify_join_context()`. - We call `Joinable.__init__(self)` in `C`'s constructor. - The `C.join_config` field will be used internally by the context manager. This does not affect `C`'s serializability. - Run time arguments for `C`'s join hook can be passed in as keyword arguments to the context manager: `with Join([C()], arg1=..., arg2=...):`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61555 Test Plan: I ran the existing DDP join tests: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception ``` I ran the ZeRO join tests: ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py TestZeroRedundancyOptimizerDistributed.test_zero_join_gpu TestZeroRedundancyOptimizerDistributed.test_zero_join_cpu ``` Reviewed By: zou3519 Differential Revision: D29690359 Pulled By: andwgu fbshipit-source-id: 2950f78de755eb5fb13b95b803dd7c705879a9c7	2021-07-14 08:20:40 -07:00
Charles David Hernandez	03a79f43e3	adding support for index_select on quantized tensors (#61406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61406 Only really needed to fix a few select functions so that they could work for quantized tensors. Primarily creation and resizing of tensors required a branch for quantized tensors. This doesn't work for per_channel tensors Test Plan: ```python test/test_quantization.py TestQuantizedTensor.test_qtensor_index_select_cuda``` ```python test/test_quantization.py TestQuantizedTensor.test_qteensor_index_select_cpu``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29654446 fbshipit-source-id: 8fde9b2dd2c3e380cc330bbad71d6c4d2aeec0ab	2021-07-14 05:38:00 -07:00
Hao Lu	a07b08136f	[Static Runtime] Check unsupported up when enabling static runtime (#61613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613 Reviewed By: ajyu, movefast1990 Differential Revision: D29663466 fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750	2021-07-14 02:13:51 -07:00
James Reed	ac64a41e8a	[FX][docs] Add note about python set pitfall (#61597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61597 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29685735 Pulled By: jamesr66a fbshipit-source-id: b5c5b53ff94fac1022f69b7c0ad4e4055b116029	2021-07-13 20:09:13 -07:00
Rong Rong (AI Infra)	9ade039593	fix test file not found issue (#61610 ) Summary: it should not error out if the file is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61610 Reviewed By: samestep Differential Revision: D29687958 Pulled By: walterddr fbshipit-source-id: 17cacba8daa131df9bfb37fd58d6e4870ff75198	2021-07-13 17:50:50 -07:00
Dimitrije Jankov	2ab8126e36	Add NewLib support (#60345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60345 Add support for NewLib, an embedded libc variant by re-using existing Android library stubs plus few NewLib specific guards Problem: Newlib is a C standard library intended for embedded use, similarly to how Android uses bionic. This causes some incompatibility with the math functions that are present in glibc but not Newlib (and some versions bionic) and makes porting PyTorch to environments such as SGX hard. Solution: Subscribed Newlib to the same fixes present for older versions of Android and add fixes specific for Newlib Test Plan: Run the PyTorch tests. Reviewed By: malfet Differential Revision: D29022623 fbshipit-source-id: 028dd7ff9b3ee394371c275642c90c9ef108e639	2021-07-13 17:26:45 -07:00
Can Balioglu	8e6d8991b2	[torch/elastic] Fix the agent store key prefix used by workers (#61590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61590 This PR fixes the bug where the state of the first run of a failed training job leaks to the secondary runs due to constant worker key prefix. ghstack-source-id: 133494239 Test Plan: Run the existing integ tests. Reviewed By: SciPioneer Differential Revision: D29682743 fbshipit-source-id: d96ecadcfe5b6563225ee19f5d0776c7f935393a	2021-07-13 14:57:27 -07:00
Scott Wolchok	523d6fe27c	[PyTorch] Remove unnecessary std::string in Device.cpp (#61502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61502 No reason not to use string literals here. ghstack-source-id: 133449808 Test Plan: buildsizebot Reviewed By: dhruvbird Differential Revision: D29648079 fbshipit-source-id: 74ecf12283c2f196b4b3edb75c6bb1eeed51322e	2021-07-13 14:36:13 -07:00
dependabot[bot]	72394aaf68	Bump addressable from 2.7.0 to 2.8.0 in /ios/TestApp (#61573 ) Summary: Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md">addressable's changelog</a>.</em></p> <blockquote> <h1>Addressable 2.8.0</h1> <ul> <li>fixes ReDoS vulnerability in Addressable::Template#match</li> <li>no longer replaces <code>+</code> with spaces in queries for non-http(s) schemes</li> <li>fixed encoding ipv6 literals</li> <li>the <code>:compacted</code> flag for <code>normalized_query</code> now dedupes parameters</li> <li>fix broken <code>escape_component</code> alias</li> <li>dropping support for Ruby 2.0 and 2.1</li> <li>adding Ruby 3.0 compatibility for development tasks</li> <li>drop support for <code>rack-mount</code> and remove Addressable::Template#generate</li> <li>performance improvements</li> <li>switch CI/CD to GitHub Actions</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`6469a232c0`"><code>6469a23</code></a> Updating gemspec again</li> <li><a href="`24336385de`"><code>2433638</code></a> Merge branch 'main' of github.com:sporkmonger/addressable into main</li> <li><a href="`e9c76b8897`"><code>e9c76b8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/378">https://github.com/pytorch/pytorch/issues/378</a> from ashmaroli/flat-map</li> <li><a href="`56c5cf7ece`"><code>56c5cf7</code></a> Update the gemspec</li> <li><a href="`c1fed1ca0a`"><code>c1fed1c</code></a> Require a non-vulnerable rake</li> <li><a href="`0d8a3127e3`"><code>0d8a312</code></a> Adding note about ReDoS vulnerability</li> <li><a href="`89c76130ce`"><code>89c7613</code></a> Merge branch 'template-regexp' into main</li> <li><a href="`cf8884f815`"><code>cf8884f</code></a> Note about alias fix</li> <li><a href="`bb03f7112e`"><code>bb03f71</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/371">https://github.com/pytorch/pytorch/issues/371</a> from charleystran/add_missing_encode_component_doc_entry</li> <li><a href="`6d1d8094a6`"><code>6d1d809</code></a> Adding note about :compacted normalization</li> <li>Additional commits viewable in <a href="https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=addressable&package-manager=bundler&previous-version=2.7.0&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts). </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/61573 Reviewed By: xta0 Differential Revision: D29685329 Pulled By: seemethere fbshipit-source-id: a43008155144a358950dc3ed1934fcc470b73c02	2021-07-13 14:30:33 -07:00
Angela Yi	0751a41ab1	[quant] Input-Weight Equalization - ConvReLU support (#61350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350 Applied changes in convert to allow for ConvReLU2d layers Initial Model: `x -> conv1 -> relu` After fusion: `x -> convRelu2d` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial Model: ``` ConvReluModel( (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (relu): ReLU() ) ``` After prepare: ``` GraphModule( (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552) (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver( (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000])) ) (fc): ConvReLU2d( (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU() ) (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981) ) graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29638275 fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb	2021-07-13 14:00:40 -07:00
Angela Yi	b3e4dab45a	[quant] Input-Weight Equalization - Conv convert support (#61287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287 Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` After convert: ``` scale, zero_point w (scaled) \| \| x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y \| eq_scale ``` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial model: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0] %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557055 fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e	2021-07-13 14:00:38 -07:00
Angela Yi	77d36b657a	[quant] Input-Weight Equalization - Conv prepare support (#61286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286 Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Initial: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29557051 fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9	2021-07-13 14:00:36 -07:00
Angela Yi	ce9cedd119	[quant] Input-Weight Equalization - Conv observer support (#61285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285 Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557041 fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6	2021-07-13 13:59:23 -07:00
Anjali Chourdia	30e48bbeae	Add neg bit (#56058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058 User facing changes: 1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`) 2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag. Non user facing changes: 1. Added a new Negative dispatch key and a backend fallback to handle it 2. Updated copy kernel to handle negative bit 3. Merged conjugate and negative bit fallback kernel 4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987) Testing: 1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view). 2. Added a new test class containing `test_conj_view`, `test_neg_view`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29636403 fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63	2021-07-13 13:50:42 -07:00
Aliaksandr Ivanou	60382de455	[torch] Set `nproc_per_node` to 1 (#61552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61552 Set `nproc_per_node` to 1 Test Plan: unittests Reviewed By: cbalioglu Differential Revision: D29667056 fbshipit-source-id: 6601f66fec5e018c7737d909f8c71642451abb29	2021-07-13 13:35:25 -07:00
Max Motovilov	437e7d9fc9	codegen_backend_module() now passes correct type designators to isinstance in the generated script Summary: For methods returning complex (i.e. container) types, the existing code attempted to pass type designators with unsupported syntax (e.g. `Tensor[]`) into `isinstance`. Will now use the correct syntax supported by TorchScript (i.e. `List[Tensor]`). Test Plan: Unfortunately, a backend supporting methods returning container types has not yet been identified so the functionality cannot be tested end-to-end. Adding a printout of `method_ct.format(method_te)` before https://fburl.com/code/4619d12g lets inspect the difference in the generated method body, e.g.: ``` assert isinstance(_0, List[Tensor]) ``` vs ``` assert isinstance(_0, Tensor[]) ``` Reviewed By: allwu Differential Revision: D29537358 fbshipit-source-id: 3356f3c1477aa9304e1f070711f480441579414d	2021-07-13 12:18:17 -07:00
Akshit Khurana	b42cc19c88	Fix broken assertion error test in NNAPI convertor (#61586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61586 Error message was changed Test Plan: pytest test/test_nnapi.py: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29682319 fbshipit-source-id: 52a96d79633ee9aae1de2056c7583311edc92353	2021-07-13 11:46:32 -07:00
Eli Uriegas	2ade4d2a92	.github: Ensure clean workspaces before checkout (#61565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61565 I was noticing the checkout step failing a lot for me, this adds a cleaning step to fully remove the github workspace before attempting to do your checkout Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D29671074 Pulled By: seemethere fbshipit-source-id: 43a8f9a9a272c6bdbfffa9c6263443aac37f4b89	2021-07-13 11:13:48 -07:00
Rohan Varma	d5204064dc	[BE] Fix flaky ProcessGroupGloo tests (#61396 ) Summary: A hypothesis as to why tests such as https://github.com/pytorch/pytorch/issues/57469 may be flaky is due to `c10d = ProcessGroupGloo(...)` is not actually guaranteed to be a synchronization point, so some ranks may create the PG, run all the error checking (which does not actually call into gloo APIs so doesn't require synchronization), and then exit, all before other ranks have created the gloo pg. This can result in the following error: ``` File "distributed/test_c10d_gloo.py", line 1037, in test_reduce_checks May 03 06:42:34 pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts()) May 03 06:42:34 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [127.0.0.1]:35521 ``` which indicates that the remote end has hung up. Furthermore all the flaky tests in this file only do error checking and don't call into the gloo APIs, further indicating that this issue may be the root cause. Not 100% sure this PR will fix it because I haven't been able to actually repro the issue even after 10000+ runs, but it happens regularly in CI. To fix this, we add a `dist.barrier(group=pg)` call after creating the pg to enforce a synchronization. Would be good to land this and observe whether it helps with the flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61396 Reviewed By: mrshenli Differential Revision: D29664189 Pulled By: rohan-varma fbshipit-source-id: bc046d5d816fe6cb426522b85312383bfa3f90b7	2021-07-13 10:34:59 -07:00
Heitor Schueroff	3e5d2b539d	Replace deprecated comment with C10_DEPRECATED in linalg.h (#60374 ) Summary: Replace // DEPRECATED comment with C10_DEPRECATED. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60374 Reviewed By: H-Huang Differential Revision: D29661630 Pulled By: heitorschueroff fbshipit-source-id: fc086276fd7d3ddfb8d17c67ade456377ef0e990	2021-07-13 08:21:22 -07:00
Nikita Shulga	9679fa7f30	Update cpp_extension.py (#61484 ) Summary: By default, majority of Python-3.[6789] installation comes with `pkg_resources.packaging` version 16.8 (or `setuptool` older than 49.6.0), which does not have major/minor properties on Version package, as one can observe in https://github.com/pypa/setuptools/blob/v49.5.0/pkg_resources/_vendor/packaging/version.py On the other hand, compare operators exists, so why not use it to check for version equality Fixes https://github.com/pytorch/pytorch/issues/61036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61484 Reviewed By: walterddr, seemethere Differential Revision: D29643883 Pulled By: malfet fbshipit-source-id: 3db9168c1b009ac3a278709083ea8c5b417471b8	2021-07-13 07:11:58 -07:00
Tongliang Liao	0afbb9e81e	`PYTHON_LIBRARY` may be set to empty or NOTFOUND. (#61230 ) Summary: Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake. So instead of checking whether they are defined, we should check whether there is any meaningful value inside. Fixes https://github.com/pytorch/pytorch/issues/59887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230 Reviewed By: H-Huang Differential Revision: D29668766 Pulled By: malfet fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1	2021-07-13 07:09:31 -07:00
Michael Melesse	ac6ec0efa1	[ROCM] fix bug in #60313 (#61073 ) Summary: This PR fixes a bug in https://github.com/pytorch/pytorch/issues/60313. Where the tensors generated by _generate_valid_rocfft_input are on the cpu instead of the gpu. This was due to using numpy to generate tensors and converting it to pytorch using torch.from_numpy. This leads to the generated tensors staying on the cpu. We now generate the tensors using pytorch itself which carries over the device type of the input tensors to the generated tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61073 Reviewed By: H-Huang Differential Revision: D29668418 Pulled By: malfet fbshipit-source-id: ce2025c26d079c15603a89b9bf7878f48d73155e	2021-07-13 07:08:17 -07:00
Jiewen Tan	2e49c5dc37	Move GetArgumentNamesModule registration to InterpreterManager() (#61549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61549 Move GetArgumentNamesModule registration to InterpreterManager() such that the module is a permanent part of the interpreters and can be used by InterpreterSession.global() freely. Test Plan: [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetArgumentNames Reviewed By: wconstab Differential Revision: D29643460 fbshipit-source-id: cf132d4795cbb334ce164ac715d590a105535508	2021-07-13 00:57:01 -07:00
Meghan Lele	5144381b1d	[pytorch][JIT] Widen exception caught by ScriptList casting (#61520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520 This commit widens the exception caught by the try-catch block that checks if an object passed to a scripted function is a `ScriptList`. It turns out that there are internal tests that do not throw a `py::cast_error` so catching only that is not sufficient. Test Plan: Ran the failing tests in T94889011. Reviewed By: Chillee Differential Revision: D29560815 fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd	2021-07-12 23:20:58 -07:00
Dimitrije Jankov	94840969e4	SGX can not read from /dev/urandom (#60368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60368 Problem: The SGX secure enclave does not support reading from /dev/urandom as it is isolated from the OS for greater security. The SGX api provides a way to generate random numbers as a replacment. Solution: Conditionally enable SGX api for random number generation when building for it. Test Plan: Run the PyTorch tests Reviewed By: malfet, LiJihang Differential Revision: D29022616 fbshipit-source-id: 1c7115457a2abde682df4d55fa4a8446fc5f8613	2021-07-12 20:43:23 -07:00
Don Jang	8a2c7d902f	[static runtime] Add DCHECK to ensure that outputs do not overlap with immutable inputs (#61301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61301 This change adds a `DCHECK` to ensure that outputs do not overlap with immutable inputs. Test Plan: Added unittests as follows: - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithImmutableArguments` - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithMutableArguments` Reviewed By: hlu1 Differential Revision: D29564158 fbshipit-source-id: bf14b4978ab544af79010cf724ed28202b4521cc	2021-07-12 18:04:05 -07:00
Vitaly Fedyunin	4ef640d6f6	Sort imports of test_datapipe.py (#61312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61312 Sorting according to isort output. Alphabetically ordered one per line imports help merging. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588833 Pulled By: VitalyFedyunin fbshipit-source-id: 4c80c3086132b50894e734ad6c5799d78d689e42	2021-07-12 15:33:20 -07:00
Vitaly Fedyunin	fd13e925ec	Adding backward compatibility for sharding support in old DataLoader (#61237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61237 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588832 Pulled By: VitalyFedyunin fbshipit-source-id: 3bfa4417f6a04450f656ecf28fc95322d2cf076a	2021-07-12 14:53:45 -07:00
Vitaly Fedyunin	d3cb065b2f	Implement usage of `is_shardable` and `apply_sharding` (#61236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61236 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588835 Pulled By: VitalyFedyunin fbshipit-source-id: 00c3042f96af498637b2dcf6e3f842c1fc05ddd8	2021-07-12 14:23:20 -07:00
Joel Schlosser	4d842d909b	Revert FC workaround for ReflectionPad3d (#61308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61248 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61308 Reviewed By: iramazanli Differential Revision: D29566849 Pulled By: jbschlosser fbshipit-source-id: 8ab443ffef7fd9840d64d71afc2f2d2b8a410ddb	2021-07-12 14:19:07 -07:00
Eli Uriegas	2fd37a830e	Revert D29642893: .github: Add force_on_cpu tests for windows Test Plan: revert-hammer Differential Revision: D29642893 (`a52de0dfec`) Original commit changeset: 2dd2b295c71d fbshipit-source-id: c01c421689f6d01cdfb3fe60a8c6428253249c5f	2021-07-12 14:01:44 -07:00
Eli Uriegas	7fdce39a4b	Revert D29642891: .circleci: Remove force_on_cpu jobs from circleci Test Plan: revert-hammer Differential Revision: D29642891 (`2aedd17661`) Original commit changeset: d51bb859bc28 fbshipit-source-id: a39a2d57d6e68961d94d4137a57bdc280f9b1b5b	2021-07-12 13:59:39 -07:00
Michael Dagitses	58df01c3b8	clarify default value of requires_grad for tensors (#61038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491984 Pulled By: dagitses fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad	2021-07-12 12:57:37 -07:00
Michael Dagitses	5897a60480	warn about SVD outputs not supporting backprop (#61037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61037 * #61037 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491985 Pulled By: dagitses fbshipit-source-id: 6322e7c86cade52671062ee97d2fcb8c15d8aa86	2021-07-12 12:55:37 -07:00
Rong Rong (AI Infra)	65ab861ec6	fix mm not correctly report TORCH_CHECK failure issue (#61394 ) Summary: fixes https://github.com/pytorch/pytorch/issues/61291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61394 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29614208 Pulled By: walterddr fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c	2021-07-12 12:50:51 -07:00
vfdev	68f9819df4	Typo fix (#41121 ) Summary: Description: - Typo fix in the docstring Pull Request resolved: https://github.com/pytorch/pytorch/pull/41121 Reviewed By: heitorschueroff Differential Revision: D29660228 Pulled By: ezyang fbshipit-source-id: fc2b55683ec5263ff55c3b6652df3e6313e02be2	2021-07-12 12:43:47 -07:00
Jeff Hwang	255a324258	add nesting_level as attribute to pickle for map datapipe (#61534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61534 currently, attribute `nesting_level` on `MapIterDataPipe` is not pickled. this yields `AttributeError` exceptions when multiprocessing with `DataLoader` this diff adds it as an attribute to pickle Test Plan: confirmed errors go away after change Reviewed By: ejguan Differential Revision: D29648655 fbshipit-source-id: 943b57eaff9712eb7ce92f43cb360acdb3111f2b	2021-07-12 11:41:01 -07:00
Elton Leander Pinto	5144cc029e	Bump docker image tag for clang-tidy (#61545 ) Summary: Fixes recent `clang-diagnostic-errors` on clang-tidy runs See https://github.com/pytorch/test-infra/pull/59 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61545 Reviewed By: malfet, seemethere Differential Revision: D29664061 Pulled By: 1ntEgr8 fbshipit-source-id: cca482a8774e34e61919f2298846ae0b479bf224	2021-07-12 11:32:39 -07:00
Rong Rong (AI Infra)	a5a10fe353	Move all downloading logic out of common_utils.py (#61479 ) Summary: and into tools/ folder Currently run_tests.py invokes tools/test_selections.py 1. download and analyze what test_file to run 2. download and parse S3 stats and pass the info to local files. 3. common_utils.py uses download S3 stats to determine what test cases to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479 Reviewed By: janeyx99 Differential Revision: D29661986 Pulled By: walterddr fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595	2021-07-12 11:23:22 -07:00
Eli Uriegas	2aedd17661	.circleci: Remove force_on_cpu jobs from circleci (#61473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61473 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29642891 Pulled By: seemethere fbshipit-source-id: d51bb859bc28efe15618d1e65f1a1cee64d60508	2021-07-12 11:17:33 -07:00
Eli Uriegas	a52de0dfec	.github: Add force_on_cpu tests for windows (#61472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61472 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29642893 Pulled By: seemethere fbshipit-source-id: 2dd2b295c71d79593ad7f71d6160de4042c08b80	2021-07-12 11:16:17 -07:00
Amy He	51d18369c3	[1/N] Nnapi backend delegation preprocess (#61499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61499 Added a preprocess function for the delegate to Nnapi backend (internal and external files). In the past we had functions and classes for converting to the Nnapi backend. Now, these functions and classes will be wrapped by the delegate API. ### nnapi_backend_preprocess.cpp: Contains the preprocess function, which uses Pybind to call an existing python function, `convert_model_to_nnapi()`. - The model is wrapped by a `RecursiveScriptModule`, so that `convert_model_to_nnapi()` can run correctly, since when jumping from Python to C++ to Python, the model loses its original wrapper. - A tensor, which includes shape, data type, and quantization information, is passed through preprocess's compile_spec to `convert_model_to_nnapi()`. - Finally, the Nnapi model is serialized for mobile and returned as a string. ### nnapi_backend_lib.cpp: Contains stub functions for compile and execute, and is necessary for the Nnapi backend to be registered correctly. These will be implemented in a future PR. TODO: implement execute and compile for the delegate API; throw exceptions for incorrect an compile_spec; add OSS tests Testing: Tests were done locally (see D29647123). A simple module was lowered to Nnapi, saved locally, and examined. ghstack-source-id: 133415234 Test Plan: Tests were done locally (see D29647123). TODO: add test in OSS in test_backends.py after CMake is ready. I ran buck run caffe2:nnapi_backend_example. The model files are saved as nnapi_model.ptl and mobile_model.ptl. I checked that both zip files have expected contents. Reviewed By: iseeyuan Differential Revision: D29563351 fbshipit-source-id: 642e349356e38aecc1b9973c285569650c02668c	2021-07-12 11:13:05 -07:00
kshitij12345	3faf6a715d	[special] migrate log_softmax (#60512 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Rendered Docs: https://14335157-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.log_softmax Pull Request resolved: https://github.com/pytorch/pytorch/pull/60512 Reviewed By: iramazanli Differential Revision: D29626262 Pulled By: mruberry fbshipit-source-id: c42d4105531ffb004f11f1ba6ae50be19bc02c91	2021-07-12 11:01:25 -07:00
Vitaly Fedyunin	f2857883c4	Add DataPipes Graph Functions (#61235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61235 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588834 Pulled By: VitalyFedyunin fbshipit-source-id: e0331d6e1fc2a3f8b6211aac83965bcf13165161	2021-07-12 10:28:35 -07:00
Thomas J. Fan	25a705610f	ENH Adds support for no-batch dim in AdaptiveAvgPool1d (#61264 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61264 Reviewed By: iramazanli Differential Revision: D29615292 Pulled By: jbschlosser fbshipit-source-id: 826d1c87d67261a7211270e90e3a1022bbbe37bd	2021-07-12 10:24:37 -07:00
Richard Zou	583b045fc3	Make .contiguous(memory_format) call .clone(memory_format) (#61456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61456 functorch is unable to `vmap(grad(f))` when `f` contains a `.contiguous` call. This is because `.contiguous` (when it is not a no-op) decomposes to `.copy_` under grad and the `.copy_` is not compatible with vmap. The fix for this is to have `.contiguous` call `.clone` instead of `.copy_`. `clone` is a primitive w.r.t. to autograd, so `grad` decomposes contiguous into clone. Perf testing (forward pass) - [script and output](https://gist.github.com/zou3519/294f583b9c5d7bdf234d5295f97fb02e) - The instruction count increased from 774479 to 781379. This is because we're now calling .clone(), which does an additional dispatch. We could optimize the implementation of clone() to not dispatch on .copy_() in the future if we really care about this. Perf testing (backward pass) - [script and output](https://gist.github.com/zou3519/6fbdb121de6342334192d55c8a72276a) - The instruction count decreased from 5402648 to 5335977. This is because the [backward for .clone](`9b908ab0d0/tools/autograd/derivatives.yaml (L383)`) is a lot simpler than the [backward for copy_](`9b908ab0d0/torch/csrc/autograd/functions/tensor.cpp (L37-L41)`) - The backward for .clone() and .copy_() end up doing the same thing for contiguous (from reading the code above, they both do no-op copies). Test Plan: - wait for existing tests (test_view_ops have the tests) - functorch isn't tested in PyTorch CI yet. - Taking suggestions on how to write a test for this. I'm thinking we could use LoggingTensor from #59760 (because it logs underneath autograd) and test that clone is called instead of copy_ but I didn't want to refactor it into a utility Reviewed By: soulitzer Differential Revision: D29636859 Pulled By: zou3519 fbshipit-source-id: 97eb56bfae1c4bb31612dc9d06536019f21d69a6	2021-07-12 10:19:33 -07:00
Ansha Yu	5a20c56ebc	[static runtime] Remove hasOperation() check (#61496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61496 glow::FusionGroup is JitOnlyOperator that produces an Operation when passed a Node* https://fburl.com/ybwfn3bl hasOperation doesn't return true in that case https://fburl.com/19wd10aw by removing the hasOperation() check, the Operation gets successfully materialized, and static runtime enables successfully and runs ok. Will check that the outputs match with jit interpreter Test Plan: Test with 281805158_2 ``` ./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=281805158_2 --prediction_replayer_target_tier=127.0.0.1:7447 --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filter_requests_inline_cvr_post_imp_model_1000_2021_04_29 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1 ``` ``` NNPI_LOG_LEVEL=0 USE_INF_API=1 ./buck-out/gen/sigrid/predictor/sigrid_remote_predictor_glow_nnpi \ --force_models=281805158_2 \ --sigrid_predictor_model_suffix=.predictor.disagg.remote_other \ --gflags_config_path=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 \ --smc_server_port=7447 \ --sigrid_predictor_tier_name=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --predictor_storage_smc_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --predictor_storage_smc_tier_v2=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --torch_glow_min_fusion_group_size=30 \ --glow_enable_sanitize_inputs=100 \ --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ \ --pytorch_predictor_static_runtime_enable=true \ --pytorch_predictor_glow_enable=true \ --pytorch_predictor_enable_loading_xl_format_on_cpu=false \ --pytorch_disagg_acc_input_dump_path=/tmp/ ``` Reviewed By: hlu1 Differential Revision: D29647043 fbshipit-source-id: 8ce6dc0f4f0464b65ca6a8c9d42e3d8bb392e66e	2021-07-12 10:09:33 -07:00
Vitaly Fedyunin	99959fe3f5	[DataLoader] Adding demux and mux DataPipe-s (#61234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61234 * #61234 [WIP] Adding demux and mux DataPipe API examples Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588836 Pulled By: VitalyFedyunin fbshipit-source-id: 523d12ea6be7507d706b4c6d8827ec1ac4ccabc3	2021-07-12 10:04:03 -07:00
Kushashwa Ravi Shrimali	d46689a201	OpInfo reference tests for `add` and `sub` (#61169 ) Summary: This PR adds OpInfo reference checks for `add, sub`. See https://github.com/pytorch/pytorch/issues/54261 cc: mruberry pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/61169 Reviewed By: iramazanli Differential Revision: D29625702 Pulled By: mruberry fbshipit-source-id: c5e536ab52865890990353c5c862b44b5a16ed20	2021-07-12 09:27:22 -07:00
Xiao Wang	c18017190b	Relax some linalg test tolerances (#61101 ) Summary: We are seeing some test failures on A100 machine, though TF32 matmul is not involved in these cases. I tried `svd_lowrank` test. It passed while testing itself, but failed when I run the whole test suite. It's probably some random seed issue. Relax test tolerance would be much easier to do. Some SVD tests failed when we compare CPU float32 vs GPU float32. Since linear algebra are sort of unstable at single precision, comparing two single precision results may give some false positives. So we calculate CPU results in float64 or complex128, which is much more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61101 Reviewed By: ngimel Differential Revision: D29593483 Pulled By: mruberry fbshipit-source-id: 3df651e3cca1b0effc1a4ae29d4f26b1cb4082ed	2021-07-12 09:17:59 -07:00
Edward Yang	bacf8ecbd1	Make pin_memory/is_pinned use BackendSelect (#60547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60547 These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. https://github.com/pytorch/pytorch/pull/59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, bdhirsh Differential Revision: D29331881 Pulled By: ezyang fbshipit-source-id: db3b4e2c872ba1caa0243fecc60a4da65179ce28	2021-07-12 09:13:14 -07:00
Masaki Kozuki	7136a62b56	Add `expecttest` to CONTRIBUTING.md (#61163 ) Summary: Now expecttest is an independent library but `CONTRIBUTING.md` and `requirements.txt` do not mention the need of the library. Related: https://github.com/pytorch/pytorch/pull/60658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61163 Reviewed By: heitorschueroff Differential Revision: D29660296 Pulled By: ezyang fbshipit-source-id: e2e86d42526c83bec7cdf7221e19fe83d9686103	2021-07-12 09:11:12 -07:00
hauntsaninja	8754238410	torch._utils.ExceptionWrapper: fix for Exceptions with multiple args (#58131 ) Summary: Here's an example of what this PR should fix: ``` from torch._utils import ExceptionWrapper class TwoArgException(Exception): def __init__(self, msg, count): ... # If you need a "real world" exception with two args, here's one from the stdlib: # import asyncio # TwoArgException = asyncio.exceptions.LimitOverrunError # or if on Python 3.7, try: # TwoArgException = asyncio.streams.LimitOverrunError try: raise TwoArgException("oh no", 0) except Exception as e: data = ExceptionWrapper(where="in a test case") data.reraise() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58131 Reviewed By: heitorschueroff Differential Revision: D29660248 Pulled By: ezyang fbshipit-source-id: cbcecfee9cac183354542e147ee3d956038c8986	2021-07-12 09:04:36 -07:00
Antonio Cuni	93d98ecef7	update the pytorch-gdb example so that it works on current master (#61175 ) Summary: As pointed out by https://github.com/pytorch/pytorch/pull/54339#issuecomment-872827580, the `pytorch-gdb` example is currently broken because the code has been refactored. This PR updates the example so that it works again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61175 Reviewed By: heitorschueroff Differential Revision: D29660336 Pulled By: ezyang fbshipit-source-id: 8bcd32fc583c0b28a705ef37203ce7ad4d636732	2021-07-12 08:57:18 -07:00
cyy	0de35fe039	fix return local reference (#59913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59913 Reviewed By: soulitzer Differential Revision: D29107110 Pulled By: ezyang fbshipit-source-id: c0f9888867c7dfeb05f6a3b9d2067df35e1e3ffb	2021-07-12 08:29:32 -07:00
Jane Xu	d4549ba5dc	Add VS_VERSION to Circle (#61532 ) Summary: Fixes current HUD 10.1 failure https://app.circleci.com/pipelines/github/pytorch/pytorch/349359/workflows/ead2904b-3f37-4c9d-b271-a8e772046523/jobs/14713215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61532 Test Plan: The new 10.1 CI run: https://app.circleci.com/pipelines/github/pytorch/pytorch/349677/workflows/b7143b56-e8e7-4f85-8bdf-0ce50788f3c0/jobs/14727686 Reviewed By: walterddr Differential Revision: D29661179 Pulled By: janeyx99 fbshipit-source-id: 5023c41fe6ddce4113116b07d8f0fd7d66c864a8	2021-07-12 08:21:02 -07:00
cyy	00c4897c51	use make_unique (#61272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61272 Reviewed By: pbelevich Differential Revision: D29660354 Pulled By: ezyang fbshipit-source-id: f0aba1ea6983aec415915ed9b7dbced2e2b3b171	2021-07-12 08:09:46 -07:00
mdmn07C5	ac086ca15b	Update version.txt file path (#61177 ) Summary: The file version.txt is located one directory above generate_torch_version, some platforms are unable to find this file unless given an explicit path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61177 Reviewed By: pbelevich Differential Revision: D29660334 Pulled By: ezyang fbshipit-source-id: f66105f782aaff031e373f96a69baabb13c89337	2021-07-12 07:30:10 -07:00
Richard Zou	09679af260	Delete dead code in Tensor::to implementation (#61435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61435 Deleted the following: - I couldn't find the NOTE mentioned so I deleted the reference to it - The memory_format check (because it always passes) - The requires_grad check (because it always passes) Test Plan: - run tests Reviewed By: soulitzer Differential Revision: D29636872 Pulled By: zou3519 fbshipit-source-id: 48a32c1821b72c512d337becf2398ce7f4cf01a2	2021-07-12 07:10:27 -07:00
Jane Xu	60086ab39b	Remove export PYTHONPATH hacks (#61487 ) Summary: Remove `export PYTHONPATH=$PWD` in favor of `-m` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61487 Test Plan: Let's see if CI passes Reviewed By: 1ntEgr8 Differential Revision: D29645544 Pulled By: janeyx99 fbshipit-source-id: 841aea8ebed2cb1c7dbc68754b5fbdee932559c2	2021-07-12 06:59:50 -07:00
CodemodService Bot	5c1505076b	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D29656934 fbshipit-source-id: c40bbc8e4512b145050ee47db2c8dc781f3c36e9	2021-07-12 04:15:21 -07:00
Zeina Migeed	666dff381d	add AdaptiveAvgPooling2D (#61239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61239 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626359 Pulled By: migeed-z fbshipit-source-id: b7cd4ce4176e2d6e7a853974443affd23a49d3d9	2021-07-10 20:07:14 -07:00
Zeina Migeed	93ef40bd83	add linear operation and modify one of the tests (#61238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61238 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626333 Pulled By: migeed-z fbshipit-source-id: d4303918e380d64ba8ab678f249db6674e89357a	2021-07-10 20:07:12 -07:00
Zeina Migeed	292ee65261	add maxpool2D, add more tests, handle integer parameters for maxpool2D (#61188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61188 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29626303 Pulled By: migeed-z fbshipit-source-id: 32309cd1eb1189beaba63017653b3aeccdf2761d	2021-07-10 20:06:07 -07:00
Supriya Rao	7a15576a65	[quant] update FakeQuant modules to use tensor qparams (#61318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318 Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator. Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU. Local benchmark P427668213 Before this change ``` Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::_aminmax 2.57% 1.507ms 3.10% 1.819ms 36.371us 2.872ms 4.81% 2.872ms 57.446us 50 aten::fake_quantize_per_tensor_affine 1.04% 610.915us 3.60% 2.114ms 42.276us 472.896us 0.79% 2.698ms 53.962us 50 aten::fake_quantize_per_tensor_affine_cachemask 1.69% 993.626us 2.56% 1.503ms 30.058us 2.225ms 3.73% 2.225ms 44.504us 50 aten::is_nonzero 3.85% 2.258ms 19.68% 11.540ms 46.161us 2.168ms 3.63% 11.084ms 44.336us 250 aten::zeros_like 1.82% 1.064ms 6.65% 3.901ms 39.007us 1.531ms 2.57% 3.905ms 39.045us 100 aten::eq 13.80% 8.093ms 25.90% 15.189ms 37.972us 9.580ms 16.05% 15.566ms 38.914us 400 aten::item 5.67% 3.323ms 21.50% 12.607ms 36.019us 3.233ms 5.42% 12.167ms 34.762us 350 aten::zeros 0.94% 549.208us 2.93% 1.717ms 34.343us 688.928us 1.15% 1.695ms 33.894us 50 aten::le 2.52% 1.478ms 4.50% 2.641ms 26.411us 1.753ms 2.94% 2.845ms 28.448us 100 aten::rsub 1.04% 608.715us 2.44% 1.433ms 28.667us 532.000us 0.89% 1.418ms 28.353us 50 aten::max 1.54% 905.401us 4.62% 2.711ms 27.106us 847.488us 1.42% 2.697ms 26.969us 100 aten::ones 0.92% 542.159us 2.16% 1.266ms 25.324us 661.856us 1.11% 1.301ms 26.017us 50 aten::min 0.82% 479.167us 2.15% 1.258ms 25.160us 407.808us 0.68% 1.276ms 25.530us 50 aten::_local_scalar_dense 15.83% 9.284ms 15.83% 9.284ms 26.526us 8.934ms 14.97% 8.934ms 25.524us 350 aten::clamp 2.35% 1.378ms 4.21% 2.467ms 24.669us 1.546ms 2.59% 2.461ms 24.612us 100 aten::zero_ 2.53% 1.482ms 5.65% 3.316ms 22.108us 1.326ms 2.22% 3.380ms 22.531us 150 aten::maximum 3.08% 1.805ms 3.08% 1.805ms 18.052us 1.849ms 3.10% 1.849ms 18.494us 100 aten::minimum 1.33% 778.854us 1.33% 778.854us 15.577us 868.672us 1.46% 868.672us 17.373us 50 aten::round 1.36% 799.910us 1.36% 799.910us 15.998us 809.568us 1.36% 809.568us 16.191us 50 aten::copy_ 6.61% 3.878ms 6.61% 3.878ms 15.513us 4.036ms 6.76% 4.036ms 16.143us 250 aten::div 2.53% 1.483ms 2.53% 1.483ms 14.833us 1.535ms 2.57% 1.535ms 15.353us 100 aten::mul 2.44% 1.431ms 2.44% 1.431ms 14.314us 1.478ms 2.48% 1.478ms 14.782us 100 aten::detach 1.46% 855.670us 2.41% 1.411ms 14.110us 832.448us 1.39% 1.395ms 13.949us 100 aten::add 2.22% 1.301ms 2.22% 1.301ms 13.008us 1.383ms 2.32% 1.383ms 13.828us 100 aten::fill_ 4.18% 2.452ms 4.18% 2.452ms 12.262us 2.693ms 4.51% 2.693ms 13.463us 200 aten::sub 5.06% 2.967ms 5.06% 2.967ms 14.837us 2.675ms 4.48% 2.675ms 13.374us 200 aten::to 2.10% 1.230ms 3.65% 2.140ms 10.701us 1.310ms 2.20% 2.062ms 10.310us 200 aten::select 1.28% 749.144us 1.49% 874.227us 8.742us 863.232us 1.45% 863.232us 8.632us 100 detach 0.95% 555.326us 0.95% 555.326us 5.553us 562.496us 0.94% 562.496us 5.625us 100 aten::as_strided 0.40% 232.289us 0.40% 232.289us 1.161us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 2.93% 1.720ms 2.93% 1.720ms 3.439us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.04% 611.313us 1.04% 611.313us 2.038us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.75% 438.585us 1.77% 1.036ms 5.180us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.36% 799.442us 1.36% 799.442us 3.198us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 58.645ms Self CUDA time total: 59.674ms ``` After this change ``` test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.98% 505.210us 4.38% 2.259ms 45.187us 419.424us 0.78% 3.218ms 64.367us 50 aten::_aminmax 2.78% 1.434ms 3.42% 1.766ms 35.321us 2.825ms 5.27% 2.825ms 56.505us 50 aten::fake_quantize_per_tensor_affine_cachemask_tens... 2.38% 1.229ms 3.40% 1.754ms 35.083us 2.799ms 5.22% 2.799ms 55.979us 50 aten::rsub 0.94% 485.040us 5.02% 2.590ms 51.793us 458.976us 0.86% 2.587ms 51.747us 50 aten::is_nonzero 3.78% 1.952ms 23.64% 12.196ms 48.786us 2.055ms 3.83% 11.986ms 47.944us 250 aten::item 6.92% 3.572ms 19.86% 10.244ms 40.977us 3.670ms 6.85% 9.931ms 39.724us 250 aten::zeros_like 1.65% 848.874us 6.64% 3.426ms 34.260us 1.397ms 2.61% 3.572ms 35.717us 100 aten::zeros 0.85% 436.691us 3.00% 1.549ms 30.984us 551.936us 1.03% 1.576ms 31.516us 50 aten::eq 10.60% 5.467ms 20.26% 10.452ms 26.130us 7.018ms 13.09% 10.832ms 27.079us 400 aten::le 2.58% 1.332ms 4.67% 2.407ms 24.074us 1.580ms 2.95% 2.614ms 26.144us 100 aten::_local_scalar_dense 12.93% 6.673ms 12.93% 6.673ms 26.691us 6.261ms 11.68% 6.261ms 25.046us 250 aten::clamp 2.43% 1.253ms 4.37% 2.256ms 22.560us 1.431ms 2.67% 2.273ms 22.725us 100 aten::ones 0.89% 460.133us 2.18% 1.123ms 22.467us 570.496us 1.06% 1.128ms 22.551us 50 aten::min 0.74% 383.132us 2.06% 1.065ms 21.296us 377.536us 0.70% 1.091ms 21.824us 50 aten::zero_ 2.36% 1.219ms 5.87% 3.029ms 20.194us 1.261ms 2.35% 3.199ms 21.327us 150 aten::max 1.51% 779.081us 4.06% 2.096ms 20.960us 791.680us 1.48% 2.130ms 21.295us 100 aten::sub 7.97% 4.111ms 7.97% 4.111ms 20.556us 3.847ms 7.18% 3.847ms 19.234us 200 aten::div 2.94% 1.516ms 2.94% 1.516ms 15.158us 1.580ms 2.95% 1.580ms 15.798us 100 aten::round 1.45% 750.445us 1.45% 750.445us 15.009us 756.064us 1.41% 756.064us 15.121us 50 aten::copy_ 6.88% 3.548ms 6.88% 3.548ms 14.190us 3.701ms 6.90% 3.701ms 14.803us 250 aten::minimum 1.32% 681.654us 1.32% 681.654us 13.633us 713.664us 1.33% 713.664us 14.273us 50 aten::maximum 2.55% 1.317ms 2.55% 1.317ms 13.169us 1.338ms 2.50% 1.338ms 13.378us 100 aten::mul 2.63% 1.358ms 2.63% 1.358ms 13.581us 1.328ms 2.48% 1.328ms 13.283us 100 aten::detach 1.34% 688.820us 2.35% 1.211ms 12.110us 772.800us 1.44% 1.278ms 12.779us 100 aten::fill_ 4.53% 2.338ms 4.53% 2.338ms 11.692us 2.495ms 4.65% 2.495ms 12.473us 200 aten::add 2.32% 1.197ms 2.32% 1.197ms 11.968us 1.240ms 2.31% 1.240ms 12.405us 100 aten::to 2.07% 1.069ms 3.66% 1.889ms 9.443us 1.224ms 2.28% 1.975ms 9.874us 200 aten::select 1.44% 743.042us 1.64% 848.207us 8.482us 641.600us 1.20% 641.600us 6.416us 100 detach 1.01% 522.155us 1.01% 522.155us 5.222us 505.088us 0.94% 505.088us 5.051us 100 aten::as_strided 0.44% 227.884us 0.44% 227.884us 1.139us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.20% 1.652ms 3.20% 1.652ms 3.304us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.25% 646.711us 1.25% 646.711us 2.156us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.79% 407.768us 2.07% 1.067ms 5.334us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.52% 785.788us 1.52% 785.788us 3.143us 0.000us 0.00% 0.000us 0.000us 250 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 51.590ms Self CUDA time total: 53.609ms ghstack-source-id: 133370215 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization Reviewed By: raghuramank100 Differential Revision: D29566512 fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c	2021-07-10 19:43:02 -07:00
Supriya Rao	99848c7269	[quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317 Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are * required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc * enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53 * overload consistent with `quantizer_per_tensor.tensor_qparams` ghstack-source-id: 133370216 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask Reviewed By: raghuramank100 Differential Revision: D29552727 fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d	2021-07-10 19:41:55 -07:00
Peter Bell	57676ce128	Migrate multi_margin_loss to ATen (CUDA) (#61426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61426 Closes gh-24600, closes gh-24601 These operators use custom kernels that aren't well suited to `TensorIterator` style, so this is just changing the CPU code and cleaning up the style. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29648015 Pulled By: ngimel fbshipit-source-id: cadf1890cdc2199d57f4533370e554613efeb54a	2021-07-10 18:48:58 -07:00
Xiao Wang	5a17cb6f44	Add channels-last support for bilinear and nearest 2d interpolation on CUDA (#56322 ) Summary: Add channels-last support for bilinear and nearest 2d interpolation on CUDA Benchmark (on 2070 Super) is available at - nearest 2d: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/nearest-2d - bilinear: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/bilinear Some regressions are seen for tensors with small channel size. We may add a heuristic to dispatch the contiguous and channels-last path if needed. Close https://github.com/pytorch/pytorch/issues/60137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56322 Reviewed By: mruberry Differential Revision: D29645980 Pulled By: ngimel fbshipit-source-id: c36dff4ee4789bec9b01da4029f326d30067c6b7	2021-07-10 18:00:50 -07:00
Yi Wang	df00c636d2	[Model Averaging] Skip model averaging for the first K steps (#61207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61207 Model averager now must be combined with post-localSGD DDP communication hook. It will skip model averaging for the first K steps, because post-localSGD communication hook will run global gradient averaging during this phase. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 133371335 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: pritamdamania87 Differential Revision: D29523738 fbshipit-source-id: 3fa9611046e1c0afa4bda78aa3ba200fa2a5fa4b	2021-07-10 17:12:16 -07:00
Yi Wang	0f6876d721	[Model Averaging] Create a post-localSGD communication hook (#61206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61206 Create a communication hook to run post-local SGD. This will be combined with model averager component to better support local SGD. In contrast to the previous approach that runs local gradient averaging + global model averaging at each step for the first K steps, now we plan to run global gradient averaging only for the first K steps at each step, just like normal DDP. This can give us two advantages: 1) For some optimizers, model averaging can cause discrepancy in optimizer states. If we still do global gradient averaging for the first K steps, we can defer such discrepancy until we actually start local SGD. 2) Gradient averaging at the first K steps only run one allreduce that overlaps with backward pass, so it should also be more efficient. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 133371322 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: pritamdamania87 Differential Revision: D29523292 fbshipit-source-id: 3f215f7150f2917c2781278fad759530c685ea2c	2021-07-10 17:11:10 -07:00
gmagogsfm	a46d4212bf	Allow dims=0 in torch.tensordot call (#61331 ) Summary: In one of my previous PRs that rewrite `tensordot` implementation, I mistakenly take empty value of `dims_a` and `dims_b` as illegal values. This turns out to be not true. Empty `dims_a` and `dims_b` are supported, in fact common when `dims` is passed as an integer. This PR removes the unnecessary check. Fixes https://github.com/pytorch/pytorch/issues/61096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61331 Reviewed By: eellison Differential Revision: D29578910 Pulled By: gmagogsfm fbshipit-source-id: 96e58164491a077ddc7a1d6aa6ccef8c0c9efda2	2021-07-10 17:05:20 -07:00
Hao Lu	7d7b7abb3b	[Static Runtime] Separate function for getting always_alive values (#61506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61506 Separate out the logic of GetAlwaysAliveValues from GetLivenessMap so to simplify the code structure. Also you don't need to run GetLivenessMap if optimize_memory is turned off. Reviewed By: ajyu Differential Revision: D29423534 fbshipit-source-id: dbdeeb10f7bcad86a24aa12f741f7c9ab946bb3b	2021-07-10 16:59:29 -07:00
David Reiss	7fdc5f9e08	model_dump: Fix non-counting and double-counting bugs in tensor memory (#60702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60702 - Instead of traversing and counting all tensor memory, collect a map from storage key to storage info while traversing. Add up sizes at the end to avoid double counting. - Count tensor memory from constants as well. Test Plan: Ran webdriver test. Reviewed By: dhruvbird Differential Revision: D29380396 Pulled By: dreiss fbshipit-source-id: 6d0fd66f677fe23c851aa218387aa4dc59502b1e	2021-07-10 15:16:34 -07:00
David Reiss	158d351517	model_dump: Add webdriver test (#60701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60701 The unit test previously only tested that the dump could complete successfully. It was not able to verify that any JS worked properly. Now we can test the JS as long as webdriver is installed. Tweaked the implementation of Hider a bit to make it easier for tests to find and open them. I disabled the tests by default since I don't want to deal with webdriver in CI. Enable them with the environment variable RUN_WEBDRIVER=1. We could make the tests use headless mode, but it's kind of fun to watch them run. Add a test to verify that tensor memory computation is working for the simple model. Test Plan: Ran the test. Reviewed By: dhruvbird Differential Revision: D29380398 Pulled By: dreiss fbshipit-source-id: f19d0b05d79ad5a8231e85422976f1889e021c89	2021-07-10 15:16:32 -07:00
David Reiss	cc78c463c0	model_dump: Render constants.pkl similar to data.pkl (#60700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60700 Test Plan: Dumped a model with a lot of constants (qconvs produced by optimizing). Was able to see them rendered nicely. Reviewed By: dhruvbird Differential Revision: D29380400 Pulled By: dreiss fbshipit-source-id: c951508b92bb2717591dd173282157e1a40a30bd	2021-07-10 15:16:31 -07:00
David Reiss	e292f34def	model_dump: Make stdout argument for main a keyword-only argument (#60699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60699 Also add a unit test for main, which brings the test coverage up to ~98%. Also factor out the "needs importlib.resources" check into a function for easier reuse. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D29380397 Pulled By: dreiss fbshipit-source-id: bba16da85bf7bfb4370308e38c844694d01b47eb	2021-07-10 15:16:29 -07:00
David Reiss	2942e9aa80	model_dump: update maintainer comment (#60698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60698 ... to reflect that the Python command should be re-run when changing the model. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D29380399 Pulled By: dreiss fbshipit-source-id: 1ec464da4ebe6ddf400eb4a3b14da683369c0039	2021-07-10 15:15:15 -07:00
Ansley Ussery	f5c10fdbd3	Allow for heterogenous List and Dict values + Improve container typing algorithm (#57137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57137 This PR corrects and expands our typing algorithm for unannotated, non-empty dicts and lists. Previously, to verify type correctness for an unannotated, non-empty container, we had gotten the type of the first element in the container, then checked if each following element was a subtype of the first type. That's too restrictive--what if the first element were a subtype of the second element? Instead, we should type the container by getting the smallest common supertype of all the given elements. We need slightly different rules for keys and values in dicts, though: because the set of key types is restricted, finding two key types that cannot be unified should cause an error. On the other hand, the set of value types is not restricted, so we should be able to use `Any` as a valid supertype. We need to keep the set of keys restricted since the keys are used to generate and match schemas. This does not break backwards compatibility, because the default element type is the smallest supertype of all the given types. So, if someone creates an unannotated dict where the keys are all `str` and the values are all `torch.Tensor`, the dict will be inferred to `Dict[str, Tensor]` just like it was before. Empty lists are still typed as `List[torch.Tensor],` and empty dicts are still typed as `Dict[str, Tensor]`. This PR unblocks three engineers on an FB-internal team and improves FX-TorchScript compatibility. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28231839 Pulled By: ansley fbshipit-source-id: 7297bf239749daa54895add708185c75e6ca5999	2021-07-10 14:29:05 -07:00
Hao Lu	ccd0977060	[Static Runtime] Support prim::GetAttr/SetAttr (#61505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505 The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module. Reviewed By: ajyu Differential Revision: D29350173 fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb	2021-07-10 14:06:06 -07:00
Nikita Shulga	f291b1899f	Revert D27978269: Smart Decay for Adam - Caffe2 Test Plan: revert-hammer Differential Revision: D27978269 (`aaa1e07609`) Original commit changeset: e47524101ddf fbshipit-source-id: 334824bbf9a6ed788e75af9c292754081f70a19b	2021-07-10 13:09:58 -07:00
Rohan Varma	8bcf24b37a	[TCPStore] enhance connect timeout error message (#61390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61390 Enhances this error message for better debugability. ghstack-source-id: 133185482 Test Plan: CI Reviewed By: H-Huang Differential Revision: D29601528 fbshipit-source-id: f7aaf4d67ac96e6ed0b535e0200f918dd01e42f9	2021-07-10 03:57:23 -07:00
Jithun Nair	336970c03e	Add note on torch.distributed backends on ROCm (#58975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975 Reviewed By: soulitzer Differential Revision: D29595510 Pulled By: rohan-varma fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9	2021-07-10 03:51:19 -07:00
Will Constable	73b86c9f9c	Add getMethod to PytorchPredictorContainer (#61052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61052 Implement getMethod in the container in a similar way to getPredictor, using either Deploy or Script functionality depending on how the container was initialized and how the gflag deploy override are set. Test Plan: Add new unit test Reviewed By: houseroad Differential Revision: D29346969 fbshipit-source-id: 08e95ee96d533f5a7cc9c8f9b1c53751715c9181	2021-07-09 22:27:40 -07:00
Zeina Migeed	677313b670	ReLU (#61150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61150 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29625826 Pulled By: migeed-z fbshipit-source-id: 10e0662e33ccd4342cedd51579a10651755b633f	2021-07-09 19:32:08 -07:00
Ilia Cherniavskii	a556c1c4dc	[profiler] Update Kineto submodule (ci-all) (#61478 ) Summary: Update Kineto submodule Pull Request resolved: https://github.com/pytorch/pytorch/pull/61478 Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/61432 Reviewed By: gdankel Differential Revision: D29646019 Pulled By: ilia-cher fbshipit-source-id: 02ecb0a2a6b457f6537c7d6b3c475e1e0ace3b6f	2021-07-09 19:32:06 -07:00
Jane Xu	06166a13e0	Remove VS install step unless necessary from GHA Windows workflows (#60791 ) Summary: ~~This should only be merged after our AMI has been deployed after https://github.com/fairinternal/pytorch-gha-infra/pull/1. (And will likely fail our current windows jobs)~~ I have revised this PR to install VS only when it's not already installed. This should save ~5min per Windows workflow. ![image](https://user-images.githubusercontent.com/31798555/125141598-7e886c80-e0e3-11eb-9fe0-bb9e6bcc14f1.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60791 Reviewed By: soulitzer Differential Revision: D29643876 Pulled By: janeyx99 fbshipit-source-id: 4bcfaf5bcad9e5636a1624c3e799e7cc97a87660	2021-07-09 19:32:04 -07:00
Natalia Gimelshein	9b2b45919a	Revert D29639797: [package] error if we try to mock a module in 3.6 Test Plan: revert-hammer Differential Revision: D29639797 Original commit changeset: 775ed78638fb fbshipit-source-id: 9d2f6dae7ee35c6b37338e36ec7ade9d9e2ccbc2	2021-07-09 19:31:04 -07:00
Jamie King	aaa1e07609	Smart Decay for Adam - Caffe2 (#61488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. Differential Revision: D27978269 fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a	2021-07-09 18:28:21 -07:00
Mikhail Zolotukhin	b52909d861	[TensorExpr] Add python bindings for ArgValue class and TensorExprKernel constructor accepting custom lowerings. (#61385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61385 The bindings coverage might be not full yet, but this already allows us to register custom lowerings from python. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29623487 Pulled By: ZolotukhinM fbshipit-source-id: b97ee420a57fd887e204c021b9e098764b2ee232	2021-07-09 18:27:14 -07:00
Gary Miguel	dec5aa2260	[JIT] clean up (#60390 ) Summary: * Minor: spelling, grammar. * Add calls to `GRAPH_DUMP()` where they were missing. * Add or expand a few comments. * Move a few comments to seemingly more appropriate spots. * In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it was only called in one place and had a misleading comment and confusing name. * In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when removing `aten::is_complex`. Pretty sure its absence was a bug. * Delete unused `_jit_pass_remove_inplace_ops` and and its implementation `RemoveInplaceOps()`. * In `preprocessCaffe2Ops()`, remove redundant check for nested optional types. It was already checked in `checkONNXCompatibility()`. * In `EncoderBase::AddAttribute`, log the unexpected attribute kind. I don't remember the repro case now but I did hit this error at some point and this additional logging made it easier to understand. * In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use camelCase instead of snake_case for local variables. * Add curly braces around the bodies of if and loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390 Reviewed By: Krovatkin Differential Revision: D29523283 Pulled By: SplitInfinity fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752	2021-07-09 16:28:27 -07:00
Michael Suo	54ea7d33ba	[package] error if we try to mock a module in 3.6 (#61469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61469 This feature is not supported, error out early. Differential Revision: D29639797 D29639797 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 775ed78638fb6da8f830b632726b00c0533ed176	2021-07-09 16:26:38 -07:00
Akshit Khurana	a3670ba377	Add option to specify custom NNAPI serializer (#61025 ) Summary: To add serializer for custom ops we can subclass default serializer and update ADDER_MAP Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025 Test Plan: * pytest test/test_nnapi.py::TestNNAPI for current serializer * Custom serializers to be tested with custom ops Imported from OSS Reviewed By: anshuljain1 Differential Revision: D29480745 fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34	2021-07-09 15:27:10 -07:00
Lily Johnson	cbb6ab6d88	[package] ignore dunder import errors (#61148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61148 Changes `__import__` processing to silently skip cases where the `__import__` statement cannot be parsed. Adds failed imports to a list retrievable by `PackageExporter.failed_dunder_import_list()`. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559680 Pulled By: Lilyjjo fbshipit-source-id: 2513d0b9ef271c85cadc3f5a013fbd8c8de80b46	2021-07-09 15:27:08 -07:00
Lily Johnson	12772c8dd8	[package] PackageExporter visualization methods (#61147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61147 Basic tooling to enable users to see what is inside of a PackageExporter. Added methods: - `externed/interned/mocked/denied_list()`: returns list of modules which are currently in the specified category - `relied_on_by(module_name)`: returns list of modules which rely on `module_name` - `dependency_graph_str()`: returns string format of graph for users. Example of output: ``` digraph G { rankdir = LR; node [shape=box]; "<res.foo.pkl>" -> "foo"; "foo" -> "torch.package"; "foo" -> "time"; "foo" -> "sentencepiece"; "foo" -> "package_top"; } ``` Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559683 Pulled By: Lilyjjo fbshipit-source-id: 5dff4d04af911a9c9fdd0d100420f1382eaef46e	2021-07-09 15:27:06 -07:00
Lily Johnson	b5f0576278	[package] Modify Digraph to track predecessors (#61146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61146 Track predecessors of nodes in DiGraph in order to enable cleaner dependency visualization code. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559682 Pulled By: Lilyjjo fbshipit-source-id: 06f51b1108423aece5bdd72a7b82ab736e5e4f94	2021-07-09 15:27:04 -07:00
Akshit Khurana	ae65f63971	Make nnapi flatten converter accept flex inputs (#61024 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29480748 fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c	2021-07-09 15:27:02 -07:00
Aliaksandr Ivanou	028e438d6c	[torchelastic] Make sure `rdzv_configs[timeout]` is not getting overwritten (#61471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61471 Make sure `rdzv_configs[timeout]` is not getting overwritten Test Plan: sandcastle Differential Revision: D29638606 fbshipit-source-id: e164cdddaed77e7e35412ed58ac1ee312e9d489d	2021-07-09 15:27:00 -07:00
Bradley Davis	1f4bba77b6	[fx] fix subgraph API call_module warning about no owning module (#61463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61463 seems like a small oversight(?), current test fails when warnings are recorded. discovered this when calling `graph.call_module(existing_call_module_node.target)` and it raised a warning Test Plan: `buck test //caffe2/test:fx` Reviewed By: ansley Differential Revision: D29637799 fbshipit-source-id: 2305629863230235f76a926fe2e4de480cbf853c	2021-07-09 15:25:44 -07:00
Akshit Khurana	76c0f223d3	Make nnapi cat converter accept flex inputs Summary: As title Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat Reviewed By: anshuljain1 Differential Revision: D29480747 fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9	2021-07-09 14:27:53 -07:00
Akshit Khurana	9e81d3d869	Make NNAPI linear converter accept flex inputs (#61022 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear Reviewed By: anshuljain1 Differential Revision: D29480749 fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710	2021-07-09 14:27:51 -07:00
Michael Suo	35b950ea98	[package] properly handle case where we are re-packaging mocked modules (#61434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61434 Mocking is the only time we introduce a "special" module to a torch.package of our own creation. This interacts poorly with re-packaging, since if we treat `_mock` as a regular module and try to package it normally we will produce a broken package. This PR teaches PackageExporter to recognize `_mock` modules and treat them specially during the dependency walking process, thus avoiding the issue. Test Plan: Imported from OSS Reviewed By: jdonald, Lilyjjo Differential Revision: D29638283 Pulled By: suo fbshipit-source-id: 37a7ffa34da8bb665f679fbd72aa3d71154b2209	2021-07-09 14:27:49 -07:00
Andrew Gu	4f4beb8286	Add Model Parallel Support to ZeRO (#61370 ) Summary: Overview: The existing `ZeroRedundancyOptimizer` implementation assumes that all model parameters are stored on the same device (due to the recent [refactor](https://github.com/pytorch/pytorch/pull/59834)). This change allows model parameters to be sharded across multiple devices, as in the DDP with Model Parallelism example [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). The only logic affected is the bucketing strategy used when `parameters_as_bucket_view=True`. Let `n` denote the world size and `k` denote the number of devices per process. - Previously, `k = 1`, and `self._buckets` was a `List[torch.Tensor]`, where `self._buckets[j]` is a tensor (i.e. bucket) containing the parameters assigned to rank `j` for `j = 0, ..., n - 1`. - Now, `self._buckets` is a `List[List[torch.Tensor]]`, where `self._buckets[i][j]` is a tensor containing the parameters stored on device `i` assigned to rank `j` for `i = 0, ..., k - 1` and `j = 0, ..., n - 1`. This bucket construction uses an auxiliary data structure `self._device_to_per_rank_params`, which is a `Dict[torch.device, List[List[torch.Tensor]]]`. It maps: - `dev_0` to `[rank 0's assigned parameters on dev_0, rank 1's assigned parameters on dev_1, ...]`, - `...` - `dev_{k-1}` to `[rank 0's assigned parameters on dev_{k-1}, rank 1's assigned parameters on dev_{k-1}, ...]` I removed the invariant checker `_verify_same_param_device()` and its corresponding test since it is no longer an invariant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61370 Test Plan: I added a new test `test_zero_model_parallel()` that checks for parity between a DDP model with model parallelism using `ZeroRedundancyOptimizer` and a local model with the same architecture using a local optimizer. I also verified that the existing tests still pass. Reviewed By: soulitzer Differential Revision: D29637132 Pulled By: andwgu fbshipit-source-id: 07112959fa4e94a3f40e67e88cbb58ce3cd1e033	2021-07-09 14:27:47 -07:00
Scott Wolchok	fb7ed24f6e	[PyTorch] Try using ExclusivelyOwned in LinearAlgebra (#59420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59420 This is a sample of how we might use ExclusivelyOwned on an opt-in basis. ghstack-source-id: 133089540 Test Plan: 1) CI to run regression tests 2) Spot-checked assembly for linalg_det_out. Rather than calling the intrusive_ptr dtor, we get the ExclusivelyOwned dtor inline. In particular, we do not get any atomic refcount decrement instructions emitted. 3) TODO: some kind of perf profiling; advice welcome Reviewed By: ezyang Differential Revision: D28885313 fbshipit-source-id: ae4b39ed738c41d0c4a4509a5199c040ba9aa63a	2021-07-09 14:27:45 -07:00
Scott Wolchok	a5c5b56cf5	gen ExclusivelyOwned in structured kernels (#59827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59827 ghstack-source-id: 133089541 Test Plan: existing CI Reviewed By: ezyang, janeyx99 Differential Revision: D28965922 fbshipit-source-id: ffbc1d43e5d3ab3abfad3b0830b4da1ce899f505	2021-07-09 14:26:37 -07:00
Elton Leander Pinto	711ded688d	Add a script to codemod max_tokens_total pragmas to C/C++ files (#61369 ) Summary: This PR adds a new script: `max_tokens_pragmas.py` This is a utility script that can add/remove `max_tokens_total` pragmas from the codebase. - [x] Implement script and test manually - [x] Write test script Examples: First, change directories ```bash cd tools/linter/clang_tidy ``` Then run the following: ```bash cat << EOF > test/test1.cpp // File without any prior pragmas int main() { for (int i = 0; i < 10; i++); return 0; } EOF cat << EOF > test/test2.cpp // File with prior pragmas #pragma clang max_tokens_total 1 int main() { for (int i = 0; i < 10; i++); return 0; } EOF cat << EOF > test/test3.cpp // File with multiple prior pragmas #pragma clang max_tokens_total 1 // Different pragma; script should ignore this #pragma clang max_tokens_here 20 int main() { for (int i = 0; i < 10; i++); return 0; } #pragma clang max_tokens_total 1 EOF # Add pragmas to some files python3 max_tokens_pragma.py --num-max-tokens 42 test/.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # Remove pragmas from files python3 max_tokens_pragma.py --strip test/.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # should fail # Ignore files python3 max_tokens_pragma.py --num-max-tokens 42 test/.cpp --ignores test/test2.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # should not list `test/test2.cpp` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61369 Test Plan: `tools/linter/clang_tidy/test/test_max_tokens_pragma.py` Reviewed By: malfet Differential Revision: D29604291 Pulled By: 1ntEgr8 fbshipit-source-id: 3efe52573583769041a07e6776161d4d5bbf16a7	2021-07-09 13:30:52 -07:00
Elton Leander Pinto	3b004aed3a	Enable local clang-tidy lint (#61121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61121 This change enables the make target to run clang-tidy locally Test Plan: Run this command ``` make clang-tidy ``` This should run `clang-tidy` on the paths and filters specified in `tools/linter/clang_tidy/__main__.py` Quicklint ``` make quicklint ``` This should report "No files detected" if no c/cpp files are altered. Reviewed By: soulitzer Differential Revision: D29598927 Pulled By: 1ntEgr8 fbshipit-source-id: aa443030494fed92c313da4b203a5450be09fa38	2021-07-09 13:30:50 -07:00
Aliaksandr Ivanou	8296cb37c7	[torchelastic] Set the correct maximum border width Summary: The diff sets the correct max border delimiters between error sections Test Plan: Example of the uncontrolled border: https://www.internalfb.com/intern/testinfra/diagnostics/7599824415964133.844424970500348.1625590344/ Reviewed By: kiukchung Differential Revision: D29636814 fbshipit-source-id: 95465d3150066bff82dc7499bb1c63ea4f5ebc2d	2021-07-09 13:29:23 -07:00
Dimitrije Jankov	6bb33d93ab	disable the format library in C10 (#60052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60052 Introduction: We would like to use the minimal implementation of C10 for our our SGX port of pytorch. This would include disabling signal handlers and the fmt library. Problem : When C10_SUPPORTS_SIGNAL_HANDLER is disabled there is no reason to have fmt enabled as it is used only in stacktraceSignalHandler. The problem is that fmt/format.h is included regardless whether C10_SUPPORTS_SIGNAL_HANDLER is disabled or not. Solution : Move the #include <fmt/format.h> inside the #ifdef section of code where C10_SUPPORTS_SIGNAL_HANDLER is checked. Test Plan: Run the pytorch unit tests. Reviewed By: h397wang, LiJihang Differential Revision: D29022628 fbshipit-source-id: 638cf98381585cd6059129d9c5a65d9e6a841575	2021-07-09 12:28:19 -07:00
Mengwei Liu	b01329b164	[xplat] Update XNNPACK to github revision 79cd5f9 (#61400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61400 allow-large-files Update XNNPACK to github version 79cd5f9. Test Plan: Spark apps build works. Hand tracking works: https://pxl.cl/1L76g Reviewed By: dreiss Differential Revision: D29385882 fbshipit-source-id: 6be920a68b876faedf7e86e33df43f8b1db14a4d	2021-07-09 12:28:16 -07:00
Santiago Castro	86463a8d02	Save some little memory in `default_collate` (#61424 ) Summary: It can be a non-little save if there are many workers and a large batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61424 Reviewed By: soulitzer Differential Revision: D29635477 Pulled By: ejguan fbshipit-source-id: 1fc48b5964e873bd8833ad81bed9d51b0b6d137e	2021-07-09 12:27:07 -07:00
Luca Wehrstedt	c830db0265	Raise error in CMake for CUDA <9.2 (#61462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61462 Anything before CUDA 9.2 is not supported (see https://github.com/pytorch/pytorch/pull/36848), and perhaps not even that. ghstack-source-id: 133312018 Test Plan: CI Reviewed By: samestep Differential Revision: D29637251 fbshipit-source-id: 4300169b7298274b2074649342902a34bd2220b5	2021-07-09 11:28:38 -07:00
Luca Wehrstedt	b5c464d5ef	Make Future store weak pointers to storages (#60943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60943 In https://github.com/pytorch/pytorch/pull/60470 we made Future store Storages rather than store references to their DataPtrs (because these references could go stale...). However this meant that the Future could keep the Storage alive, and thus keep its memory allocated, even after the user was done with it. We fix it here by instead storing a weak ptr to that Storage (well, in fact to the StorageImpl, but it's the same). ghstack-source-id: 133295799 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29454104 fbshipit-source-id: d36dee00a4841c087bb7b3f5bc39e0459f209cdb	2021-07-09 11:28:36 -07:00
Karen Zhou	962c9fbf85	[pruner] add handles for hooks (#61425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61425 Adding handle for activation reconstruction and bias forward hooks so they can be removed later ghstack-source-id: 133244536 Test Plan: This change should not affect behavior yet, but to double check: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LpM9 Reviewed By: z-a-f Differential Revision: D29619720 fbshipit-source-id: c7428d2d0325cd11ce7919e0b67321e8cc196041	2021-07-09 11:28:35 -07:00
Philip Meier	682ebc1dd1	remove UsageError in favor of ValueError (#61031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61031 See https://github.com/pytorch/pytorch/pull/58916#issuecomment-868519515. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626810 Pulled By: mruberry fbshipit-source-id: 25ddf26815f9ef82b8234d7dac811a6a13a53c54	2021-07-09 11:28:33 -07:00
Philip Meier	5401dd2f9a	change language from array to tensor (#60639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60639 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626812 Pulled By: mruberry fbshipit-source-id: 1b0e78426fd08d7b72d890adc9811d31afd805fe	2021-07-09 11:28:31 -07:00
Philip Meier	09c90b3589	relax type equality constraint (#60638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60638 Initial proposal in https://github.com/pytorch/pytorch/pull/58981#issuecomment-866690334. Opposed to the proposal, this PR only allows relaxing the type equality constraint to a common superclass constraint, for example `torch.Tensor` vs `torch.nn.Parameter`. Inputs that do not share a common superclass will still fail. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29626811 Pulled By: mruberry fbshipit-source-id: 1916c3b710d38889de7ce57eb0770c76cbbb8166	2021-07-09 11:27:32 -07:00
Jeffrey Wan	24a8915534	Relax use-count check to allow for 0 (#61414 ) Summary: Previously we require tensor use count to be exactly 1. We should actually allow for use count to be zero as well. Use count is zero when an undefined tensor is returned, and this is common in backward functions that have multiple outputs. In this PR I also remove some entries from the skip list that should be covered by this change: they return multiple tensors AND are backward functions. Batch norm is also known to return undefined tensors when `training=False`. Related issue: https://github.com/pytorch/pytorch/issues/60426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61414 Reviewed By: albanD Differential Revision: D29614687 Pulled By: soulitzer fbshipit-source-id: ab0892aed4bd1346b50b0a9552ffcc3287ac96af	2021-07-09 10:28:12 -07:00
Akshit Khurana	9e533a62f6	Make conv2d nnapi converter accept flexible batch (#61021 ) Summary: Same as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021 Test Plan: pytest test/test_nnapi.py::TestNNAPI Reviewed By: anshuljain1 Differential Revision: D29480746 fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22	2021-07-09 10:28:10 -07:00
Jagadish Krishnamoorthy	64d61901eb	[ROCm] Skip test_masked_scatter_large_tensor_cuda (#61313 ) Summary: Refer https://github.com/pytorch/pytorch/issues/60190. Skipping unit test until hipcub issue is fixed. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/61313 Reviewed By: iramazanli Differential Revision: D29626664 Pulled By: malfet fbshipit-source-id: db2a390d2a3e28ec05a5032a50aa9a35c86b96ca	2021-07-09 10:27:08 -07:00
shmsong	ee2dd35ef4	Resolving native dependency and try_run for cross compile (#59764 ) Summary: This is a PR on build system that provides support for cross compiling on Jetson platforms. The major change is: 1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764 Reviewed By: soulitzer Differential Revision: D29524363 Pulled By: malfet fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8	2021-07-09 09:29:21 -07:00
Akshit Khurana	8bd3e52e00	Add conv2d transpose NNAPI converter (#59529 ) Summary: * Conv2d transpose support * Quantize WIP Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose Reviewed By: anshuljain1 Differential Revision: D28926335 fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de	2021-07-09 09:29:20 -07:00
zilinzhu	c19adfff54	[DataLoader] Introduce ConcatMapDataPipe functional datapipe (#61010 ) Summary: As part of https://github.com/pytorch/pytorch/issues/57031, this PR adds the ConcatMapDataPipe functional datapipe for the MapDataPipe class. We may need to discuss how to treat the datapipes with no valid length. For now, I just use them as if they have infinite length and the `__getitem__` could not go pass them. Thank you for your time on reviewing this~ cc ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/61010 Reviewed By: soulitzer Differential Revision: D29587679 Pulled By: ejguan fbshipit-source-id: 5eb97fa727209bec6c534520057c64a78000626e	2021-07-09 09:29:18 -07:00
Jane Xu	2bbcc80de3	Enable disabling test cases on specific platforms (#61427 ) Summary: This adds functionality to our common_utils.py to allow disabling test cases for platforms Mac, Windows, and Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61427 Test Plan: CI should not change as no issues currently have the line "Platforms:..." I tested locally by making sure `test_async_script` is skipped while running `python test/test_jit.py -k TestAsync.test_async_script` with a cached modified `.pytorch-disabled-tests.json`: ``` { "total_count": 32, "incomplete_results": false, "items": [ { "url": "https://api.github.com/repos/pytorch/pytorch/issues/60652", "repository_url": "https://api.github.com/repos/pytorch/pytorch", "labels_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/labels{/name}", "comments_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/comments", "events_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/events", "html_url": "https://github.com/pytorch/pytorch/issues/60652", "id": 929288995, "node_id": "MDU6SXNzdWU5MjkyODg5OTU=", "number": 60652, "title": "DISABLED test_async_script (jit.test_async.TestAsync)", "user": { "login": "ezyang", "id": 13564, "node_id": "MDQ6VXNlcjEzNTY0", "avatar_url": "https://avatars.githubusercontent.com/u/13564?v=4", "gravatar_id": "", "url": "https://api.github.com/users/ezyang", "html_url": "https://github.com/ezyang", "followers_url": "https://api.github.com/users/ezyang/followers", "following_url": "https://api.github.com/users/ezyang/following{/other_user}", "gists_url": "https://api.github.com/users/ezyang/gists{/gist_id}", "starred_url": "https://api.github.com/users/ezyang/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/ezyang/subscriptions", "organizations_url": "https://api.github.com/users/ezyang/orgs", "repos_url": "https://api.github.com/users/ezyang/repos", "events_url": "https://api.github.com/users/ezyang/events{/privacy}", "received_events_url": "https://api.github.com/users/ezyang/received_events", "type": "User", "site_admin": false }, "labels": [ { "id": 1301397902, "node_id": "MDU6TGFiZWwxMzAxMzk3OTAy", "url": "https://api.github.com/repos/pytorch/pytorch/labels/module:%20flaky-tests", "name": "module: flaky-tests", "color": "f7e101", "default": false, "description": "Problem is a flaky test in CI" }, { "id": 679953883, "node_id": "MDU6TGFiZWw2Nzk5NTM4ODM=", "url": "https://api.github.com/repos/pytorch/pytorch/labels/oncall:%20distributed", "name": "oncall: distributed", "color": "f7e101", "default": false, "description": "Add this issue/PR to distributed oncall triage queue" } ], "state": "open", "locked": false, "assignee": { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false }, "assignees": [ { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false } ], "milestone": null, "comments": 0, "created_at": "2021-06-24T14:28:33Z", "updated_at": "2021-06-24T16:40:42Z", "closed_at": null, "author_association": "CONTRIBUTOR", "active_lock_reason": null, "body": "Platforms:Mac, windows, Linux\r\n```\r\nJun 24 00:59:14 ======================================================================\r\nJun 24 00:59:14 ERROR [0.477s]: test_async_script (__main__.ProcessGroupGlooWrapperTest)\r\nJun 24 00:59:14 ----------------------------------------------------------------------\r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 398, in wrapper\r\nJun 24 00:59:14 self._join_processes(fn)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 590, in _join_processes\r\nJun 24 00:59:14 self._check_return_codes(elapsed_time)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 633, in _check_return_codes\r\nJun 24 00:59:14 raise RuntimeError(error)\r\nJun 24 00:59:14 RuntimeError: Process 0 exited with error code 10 and exception:\r\nJun 24 00:59:14 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 During handling of the above exception, another exception occurred:\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 516, in run_test\r\nJun 24 00:59:14 getattr(self, test_name)()\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 400, in wrapper\r\nJun 24 00:59:14 fn()\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 270, in test_collective_hang\r\nJun 24 00:59:14 self._test_collective_hang(pg)\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 52, in _test_collective_hang\r\nJun 24 00:59:14 wrapper_pg.allreduce([tensor])\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 217, in __exit__\r\nJun 24 00:59:14 expected_regex.pattern, str(exc_value)))\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 135, in _raiseFailure\r\nJun 24 00:59:14 raise self.test_case.failureException(msg)\r\nJun 24 00:59:14 AssertionError: \"Ranks 1 failed to pass monitoredBarrier\" does not match \"[/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\"\r\n```\r\n\r\nhttps://www.internalfb.com/intern/opensource/ci/job/log/225221175921058/\n\ncc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23", "performed_via_github_app": null, "score": 0.0 } ] } ``` Reviewed By: iramazanli Differential Revision: D29627799 Pulled By: janeyx99 fbshipit-source-id: 5ef79127cbe0055c4f41766048e66f98cf80d2c4	2021-07-09 09:29:16 -07:00
Sam Estep	e9a40de1af	Add other Linux GPU auxiliary test jobs (#61055 ) Summary: - [x] add the jobs to the matrix - [x] `jit_legacy` - [x] `nogpu_NO_AVX` - [x] `nogpu_NO_AVX2` - [x] `slow` - [x] use the test config properly to enable the different test conditions - [x] validate that it works - [x] disable on pull requests before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/61055 Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/1013240987 Reviewed By: walterddr Differential Revision: D29594080 Pulled By: samestep fbshipit-source-id: 02c531ebc42feae81ecaea0785915f95e0f53ed7	2021-07-09 09:29:15 -07:00
Xiao Wang	c966ce6933	Fix several test_ops cuda dtypes tests (#60922 ) Summary: Close https://github.com/pytorch/pytorch/issues/60443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60922 Reviewed By: jdonald, iramazanli Differential Revision: D29630122 Pulled By: mruberry fbshipit-source-id: 441f79828860282e5849a2565facf9e7f72912e8	2021-07-09 09:29:13 -07:00
kshitij12345	5e9bcf9101	fix: support removing hook in the hook (#61250 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/58354 Problem: Once a hook is called `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L51-L54)` If the hook has `handle.remove()` while executing and if there are no references to the hook function object then `python` is free to garbage collect. At the subsequent call to `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L54)` we have `hook` pointing to invalid memory Thus when we try to fetch the name for `hook` from `check_single_result` with `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L175-L177)` we get segfault. Solution: Temporarily increase the life-time of hook with `Py_INCREF` till we have verified the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61250 Reviewed By: iramazanli Differential Revision: D29623826 Pulled By: soulitzer fbshipit-source-id: c71322311f19066cafb7203980668868c59d4e5e	2021-07-09 09:27:58 -07:00
Andrew Gu	179249084b	Refactor DDP join() API, adding hooks (#60757 ) Summary: Targets https://github.com/pytorch/pytorch/issues/54318. Overview: DDP offers a `join()` context manager to accommodate training on uneven inputs. This creates a new generic `_Join()` API permitting custom hooks, refactors DDP `join()` to call this generic `_Join()`, and implements a hook for ZeRO. (For now, the generic `_Join()` is implemented as private, but this may change after design discussions are cleared.) There are two classes introduced: `_JoinHook`, the class defining the customizable join hook, and `_Join`, the generic join context manager. The `_JoinHook` provides two entry points: `main_hook()`, which is called repeatedly while there exists a non-joined process, and `post_hook()`, which is called once all process have joined with the additional `bool` argument `is_last_joiner`. The class also requires `process_group` and `device` information by defining corresponding abstract property methods. Thus, to implement a join hook, (1) inherit from `_JoinHook`, (2) override `main_hook()` and `post_hook()` as appropriate, and (3) override `process_group()` and `device()` to provide process group and device information to be used by the join context manager implementation for collective communications. The `_Join` constructor requires `join_hooks: List[_JoinHook]` and optionally `enable: bool = True` and `throw_on_early_termination: bool = False`. A training loop only needs to be wrapped with `with _Join(join_hooks):` (using the appropriate `join_hooks`) to be able to train on uneven inputs without hanging/erroring. The context manager requires a `dist.all_reduce(torch.ones(1))` to be called on every non-joined process each time before it performs its collective communications in order to indicate that the process has not yet joined. It also requires that all `process_group` attributes in the `_JoinHook` objects are the same. Notes: - The argument `is_last_joiner` to `post_hook()` may be useful for finding an authoritative rank when synchronizing. - `enable` is a flag that can be set to `False` if the user knows the current training loop will not have uneven inputs. This may be used to disable join-related computation in the classes providing join hooks. - `throw_on_early_termination` is a flag that can be set to `True` to notify processes to terminate upon detecting uneven inputs (i.e. upon the first process joining when there exists a non-joined process). Notably, the notification requires an all-reduce, so to prevent hanging/erroring, non-joined process must participate in the all-reduce. The first-joining process raises a `RuntimeError`, and the other processes are expected (but not required) to do the same. This may be used to implement training on uneven inputs in cases that do not conform to the generic join context manager (e.g. `SyncBatchNorm`). - Classes providing a join hook should do so via a `_join_hook()` method that returns a `_JoinHook` instance with the methods appropriately overridden. - If there are multiple join hooks, the device specified by the first is used by the join context manager implementation to perform its collective communications. - If there are multiple join hooks, both the main and post-hooks are iterated in the order in which the `_JoinHook` objects are passed into the context manager constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60757 Test Plan: The current implementation preserves backward compatibility by not changing the existing DDP `join()` API at all. To check this, I ran through the uneven input tests (`test_ddp_grad_div_uneven_inputs`, `test_ddp_uneven_inputs_stop_iteration_sync_bn`, `test_ddp_uneven_inputs`, `test_ddp_uneven_input_join_disable`, `test_ddp_uneven_input_exception`) on the AI AWS cluster: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- ``` Because the existing DDP join logic does not provide correct gradients to the joined processes if `gradient_as_bucket_view=False` and a joined process requires those gradients to correctly update its shard of the parameters in `ZeroRedundancyOptimizer.step()`, DDP and ZeRO are not fully compatible at the moment. To work around this and to test ZeRO's join hook separately, I added a test `_test_zero_join()` (with `test_zero_join_gpu()` and `test_zero_join_cpu()` flavors), which compares DDP with a local optimizer on uneven inputs against ZeRO on uneven inputs with the gradients set manually. Reviewed By: iramazanli, mrshenli Differential Revision: D29624636 Pulled By: andwgu fbshipit-source-id: ec70a290e02518b0d8b683f9fed2126705b896c7	2021-07-09 08:29:20 -07:00
Philip Meier	8423ab4f99	Fix `CosineAnnealingWarmRestart` annotation (#61106 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44770. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61106 Reviewed By: 1ntEgr8 Differential Revision: D29635764 Pulled By: walterddr fbshipit-source-id: ddc45a7f04532a76d033ae7774706da1fa8608f7	2021-07-09 08:28:18 -07:00
CodemodService FBSourceClangFormatLinterBot	9b908ab0d0	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29631829 fbshipit-source-id: 6cef1a3a091bdf0e10838d05b2e82fc0760ebe48	2021-07-09 05:28:44 -07:00
CodemodService Bot	819bac63ff	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D29632524 fbshipit-source-id: 3eccc1804a7bf953480b9754f68ea56a2a8e3fd8	2021-07-09 05:27:29 -07:00
Luca Wehrstedt	14f63763c1	Avoid using mp.Manager to report #GPUs needed in dist tests (#61409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61409 We used a multiprocessing.Manager in order to share TEST_SKIPS between the parent and the child processes. TEST_SKIPS is a global variable that defines a unique error code for each "error type", so that the parent can figure out the reason a child exited. While originally this mapping was immutable, at some point we allowed children to modify the parent's value of that mapping so they could update the message for the `multi-gpu` error to make it reflect how many GPUs were really needed. This occurred in D23285790 (`2a4d312027`). Since then this Manager proved to be quite problematic, especially around thread safety, races, TSAN, ... (see D22753459 (`f0c46878c6`), D23641618 (`567c51cce9`), D28490129, D28794321 (`0128eb9a85`) and D29585862). This seems like an awful lot of trouble for such a small functionality. Here I propose we drop Manager and instead get the same result by using separate error codes for each number of GPUs. It should be much simpler and thus more robust. ghstack-source-id: 133236447 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29612614 fbshipit-source-id: 8ad0fedcb7796e5832a0eb196f8fdc147e02b3df	2021-07-09 01:29:35 -07:00
Yi Wang	905cd6733e	[DDP Comm Hook] Re-enable the optimization of fusing copy and division when no comm hook is specified (#61379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61379 The optimization was accidentally removed in https://github.com/pytorch/pytorch/pull/59574 This optimization can help save a scan over all the input parameters, by fusing copy and div operations. Now the default temporary hook is allreduce by sum, and no extra division is done inside the hook. ghstack-source-id: 133288529 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_non_default_stream buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_sparse_gradient buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_once buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_twice Reviewed By: rohan-varma Differential Revision: D29597614 fbshipit-source-id: 2434e4fd4e6abad7871cfe47886fe97b6e4ba28f	2021-07-09 01:29:33 -07:00
Richard Barnes	8f61d94610	Fix a variable initialization (#60896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60896 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29431625 fbshipit-source-id: 076d5ed350507b3ab1f14c1a5c7700de0427eefc	2021-07-09 01:29:31 -07:00
Richard Barnes	15010bf223	Make some downcast issues explicit (#60412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60412 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29243195 fbshipit-source-id: c508b729d6a0e6f8a591521bce788e6cfd8531f8	2021-07-09 01:29:29 -07:00
Michael Suo	6a3170dba1	[package] minor cleanups to internal APIs (#61428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61428 I was reading this code again after a while and didn't understand as quickly as I would have liked. Some of the function names are no longer accurate, etc. This PR renames these functions to be in the same language of "dependencies" that the rest of the API uses. I think the resulting usage of the APIs is more clear than before Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29620946 Pulled By: suo fbshipit-source-id: 7df640a7ffbd43998063b9ee3955c9dfcbc42cfb	2021-07-09 01:28:24 -07:00
Zeina Migeed	d52ebf2b1b	conv2d (#61093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61093 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29562478 Pulled By: migeed-z fbshipit-source-id: d41f3a9526ee52a9571cb861be03bf9ae176a373	2021-07-08 20:29:32 -07:00
Lily Johnson	5fbc853c5f	[package] PackageExporter remove verbose mode (#61145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61145 Remove 'verbose' mode from PackageExporter as people have complained that it is not useful. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559681 Pulled By: Lilyjjo fbshipit-source-id: eadb1a3a25fadc64119334a09bf1fa4b355b1edd	2021-07-08 18:26:43 -07:00
Don Jang	a74516d699	[static runtime] implement aten::log (#61393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393 Test Plan: Added `StaticRuntime.IndividualOps_Log` ``` ... [ RUN ] StaticRuntime.IndividualOps_Log V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1) V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29511622 fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517	2021-07-08 18:25:35 -07:00
Charles David Hernandez	06dfaadfc6	update internal function names that apply to both cpu and cuda (#59701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59701 These functions have been updated to work for cpu and cuda, their names are now changed to reflect that quantize_per_channel_cpu -> quantize_per_channel dequantize_quantized_cpu -> dequantize_quantized (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensor Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29018270 fbshipit-source-id: 3a0da8d5e3f357dcf19119bcdbc6172d41f2b0c1	2021-07-08 17:26:25 -07:00
BowenBao	8726f08e15	[ONNX] Update documentation (#58712 ) (#60249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60249 * Add introductory paragraph explaining what ONNX is and what the torch.onnx module does. * In "Tracing vs Scripting" and doc-string for torch.onnx.export(), clarify that exporting always happens on ScriptModules and that tracing and scripting are the two ways to produce a ScriptModule. * Remove examples of using Caffe2 to run exported models. Caffe2's website says it's deprecated, so it's probably best not to encourage people to use it by including it in examples. * Remove a lot of content that's redundant: * The example of how to mix tracing and scripting, and instead link to Introduction to TorchScript, which includes very similar content. * "Type annotations" section. Link to TorchScript docs which explain that in more detail. * "Using dictionaries to handle Named Arguments as model inputs" section. It's redundant with the description of the `args` argument to `export()`, which appears on the same page once the HTML is generated. * Remove the list of supported Tensor indexing patterns. If it's not in the list of unsupported patterns, users can assume it's supported, so having both is redundant. * Remove the list of supported operators and models. I think the list of supported operators is not very useful. A list of supported model architectures may be useful, but in reality it's already very out of date. We should add it back if / when we have a system for keeping it up to date. * "Operator Export Type" section. It's redundant with the description of the `operator_export_type` arg to to `export()`, which appears on the same page once the HTML is generated. * "Use external data format" section. It's redundant with the description of the `use_external_data_format` arg to `export()`. * "Training" section. It's redundant with the description of the `training` arg to `export()`. * Move the content about different operator implementations producing different results from the "Limitations" section into the doc for the `operator_export_type` arg. * Document "quantized" -> "caffe2" behavior of OperatorExportTypes.ONNX_ATEN_FALLBACK. * Combing the text about using torch.Tensor.item() and the text about using NumPy types into a section titled "Avoid NumPy and built-in Python types", since they're both fundamentally about the same issue. * Rename "Write PyTorch model in Torch way" to "Avoiding Pitfalls". * Lots of minor fixes: spelling, grammar, brevity, fixing links, adding links. * Clarify limitation on input and output types. Phrasing it in terms of PyTorch types is much more accessible than in terms of TorchScript types. Also clarify what actually happens when dict and str are used as inputs and outputs. * In Supported operators, use torch function and class names and link to them. This is more user friendly than using the internal aten op names. * Remove references to VariableType.h, which doesn't appear to contain the information that it once did. Instead refer to the generated .pyi files. * Remove the text in the FAQ about appending to lists within loops. I think this limitation is no longer present (perhaps since https://github.com/pytorch/pytorch/pull/51577). * Minor fixes to some code I read along the way. * Explain the current rationale for the weird ::prim_PythonOp op name. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494912 Pulled By: SplitInfinity fbshipit-source-id: 7756c010b2320de0692369289604403d28877719 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-07-08 16:29:32 -07:00
BowenBao	00b0d826a1	[ONNX] shape type inference fixes for control flow (#59319 ) (#60248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60248 * ~~Allow shape inference to skip for blocks by checking unsupported cases recursively. Currently onnx::Identity would trigger a shape inference failure.~~ Fixed in onnx submodule 1.9. * Remove previous special post process for if op, since that was for constant folding, and now it is handled elsewhere. Update new post process for control flow nodes to copy assign node shape from subblock output shape correctly. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494913 Pulled By: SplitInfinity fbshipit-source-id: de274a388df86e86403981e1b89b8b4a0d1e26d1 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:30 -07:00
BowenBao	81f95cce59	[ONNX] Extend chunk for dynamic chunk values (#59644 ) (#60247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60247 Related to #42785 Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494914 Pulled By: SplitInfinity fbshipit-source-id: 51ddb876d00185e59cfe54a8af5a9c8dd073a09f Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-07-08 16:29:28 -07:00
BowenBao	d9dc94406f	[ONNX] Add linspace symbolic (#58854 ) (#60246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60246 * Adds support for linspace op * Modifies arange symbolic in opset 9 to replicate the same behavior in which dtype is determined (similar to opset 11) as in https://pytorch.org/docs/stable/generated/torch.arange.html * Enabled some arange unit tests which were disabled for opset 9 Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494911 Pulled By: SplitInfinity fbshipit-source-id: bddff18a90f8a78121c8ecdd1dafc15c69962d66 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-07-08 16:29:26 -07:00
BowenBao	4ccfa3ffeb	[ONNX] Fix sum export with attribute keepdims (#59316 ) (#60245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60245 Fix after b9bdb07a0261ab5a0b1038f290fa03af6ce0415f. Improving previous fix on two aspects * Not only checks 0 on first dimension for empty tensor. * Do not assume empty tensor when shape is not accessible. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494917 Pulled By: SplitInfinity fbshipit-source-id: 02587c3c3be0510312c1a1959f28cab12d81812d Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:24 -07:00
BowenBao	95a7f3ccfe	[ONNX] Fix shape inference for large model (#59320 ) (#60244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244 Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494910 Pulled By: SplitInfinity fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:22 -07:00
BowenBao	9636c077c3	[ONNX] Handle onnx::Size in ComputeConstant folding (#59122 ) (#60243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60243 Handle onnx::Size in ComputeConstant folding Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494915 Pulled By: SplitInfinity fbshipit-source-id: 9782e356f5e36ae1dd2819412f970010360e9cc0 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-07-08 16:29:21 -07:00
Rong Rong	38c48e42c6	[Reland][BE] add test wall time report (#61389 ) Summary: This is a reland of https://github.com/pytorch/pytorch/issues/61322. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61389 Reviewed By: malfet Differential Revision: D29601573 Pulled By: walterddr fbshipit-source-id: dfb2bdc7d72d493c01b9dbac50ef9b79c1782054	2021-07-08 16:29:19 -07:00
Elton Leander Pinto	7481c6fc02	Bump googletest version to v1.11.0 (#61395 ) Summary: This PR bumps the `googletest` version to v1.11.0. To facilitate this change, `CAFFE2_ASAN_FLAG` and `CAFFE2_TSAN_FLAG` are divided into corresponding compiler and linker variants. This is required because `googletest v1.11.0` sets the `-Werror` flag. The `-pie` flag is a linker flag, and passing it to a compiler invocation results in a `-Wunused-command-line-argument` warning, which in turn will cause `googletest` to fail to build with ASAN. Fixes https://github.com/pytorch/pytorch/issues/60865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61395 Reviewed By: iramazanli Differential Revision: D29620970 Pulled By: 1ntEgr8 fbshipit-source-id: cdb1d3d12e0fff834c2e62971e42c03f8c3fbf1b	2021-07-08 16:29:17 -07:00
Aliaksandr Ivanou	13658b10bb	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#61294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: cbalioglu Differential Revision: D29559553 fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b	2021-07-08 16:28:06 -07:00
Pavel Belevich	10f372601d	Support RRefs that contain torch.cuda.Event (#61354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61354 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29617155 Pulled By: pbelevich fbshipit-source-id: 6e56b3fd0a0f93ecec048b58c90f2a47b4cba688	2021-07-08 15:33:08 -07:00
Brian Hirsh	8bc2ba3fe3	detect missing kernels from external backends in codegen (#60737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60737 Test Plan: Imported from OSS Reviewed By: ezyang, jdonald Differential Revision: D29392615 Pulled By: bdhirsh fbshipit-source-id: d49d013243dbc8c8b55fbdb0b9b3eed38df52255	2021-07-08 15:33:04 -07:00
Brian Hirsh	7318747a3b	move all external kernels into a class for better compiler error messages (#59839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59839 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29047680 Pulled By: bdhirsh fbshipit-source-id: 18cf4124be440a0a343b5983e1a4165db808e7c1	2021-07-08 15:31:02 -07:00
Janet Yang	86eac5b456	[caffe2] Check for number of created subnets and optionally throw an error (#57366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57366 We often get error messages such as ``` Model failed AOT (glow ahead-of-time compilation) with exception: Error during AOT optimization (non-provisioned addNetwork): Non-recoverable device error when adding network: Error code: PARTITIONER_ERROR Error message: Did not find a partition with an SLS node Error return stack: -------------------------------------------------------------------------------- glow/glow/lib/Partitioner/Partitioner.cpp:1244 -------------------------------------------------------------------------------- glow/glow/lib/Runtime/HostManager/HostManager.cpp:375 -------------------------------------------------------------------------------- ``` This makes the error message more clear by checking for the number of OnnixifiOp created before going into Glow. The check is enabled with the `verify_only_single_subnet` flag, and is disabled by default. Test Plan: Unit tests pass Reviewed By: khabinov Differential Revision: D28097674 fbshipit-source-id: 0eefd8f6ec1a82546b759be8e541256bf271a673	2021-07-08 14:29:03 -07:00
Michael Carilli	0fc110cdd1	[CUDA graphs] Don't sync between replays for cuda driver version 11.4+ (#61063 ) Summary: The bug in libcuda.so that required https://github.com/pytorch/pytorch/pull/57556 is fixed for libcuda.so versions >= 11.4. This PR changes replay() to sync after each launch only if the process's in-use libcuda.so is < 11.4. With all the "enhanced" and "forward" compatibility promises flying around, and the fact that "driver" sometimes means kernel-mode driver and sometimes means user-mode driver (libcuda.so), I wasn't sure if this PR's check suffices to trigger the sync iff the in-use libcuda.so is < 11.4, but Cuda people say what I wrote is reasonable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61063 Reviewed By: mruberry Differential Revision: D29600907 Pulled By: ngimel fbshipit-source-id: 71bf0bcbde43091e29f3812440abeb7a95d161e2	2021-07-08 13:26:07 -07:00
Nikita Shulga	80797d03e0	Simplify lambda syntax in SegmentReduce.cpp (#61416 ) Summary: Fixes Windows build by dismantling a combination nested lambdas+preprocessor magic into explicit templates Pull Request resolved: https://github.com/pytorch/pytorch/pull/61416 Reviewed By: pbelevich Differential Revision: D29616449 Pulled By: malfet fbshipit-source-id: 687ef73b8b37bc272f82d44fc690448e403e3a0c	2021-07-08 12:30:35 -07:00
Howard Huang	cdc027679b	Add compare_set in distributed docs (#61351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61351 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29588206 Pulled By: H-Huang fbshipit-source-id: 9db48e7b6de29503275f10616470ad2d66b075f9	2021-07-08 12:30:32 -07:00
Eli Uriegas	f01a4e3b02	.github: Ensure build-results per job is unique (#61005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61005 build-results have the potential to be tainted between jobs since runs are not ephemeral Signed-off-by: Eli Uriegas <seemethere101@gmail.com> Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29526747 Pulled By: seemethere fbshipit-source-id: f8c5bc5f647b771a059cbe380d694ce6dc535ae4	2021-07-08 12:30:28 -07:00
Yi Wang	4beb5f9ad6	[DDP Comm Hook] Fix some comments (#61376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61376 After SPMD is retired, the API of `get_tensors` becomes `get_tensor`. Fix some comments that refer to the obsolete API. The `allreduce` hook example does not do division inside, which actually is incorrect. ghstack-source-id: 133174272 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29596857 fbshipit-source-id: 2046b185225cd6d1d104907b5f9b4009b6e87c99	2021-07-08 12:30:24 -07:00
Kyle Chen	dfe25069a8	[ROCm] Skip test__stress_cuda test for ROCm (#60490 ) Summary: Skipping test__stress_cuda tests because they sometimes fail for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60490 Reviewed By: SciPioneer Differential Revision: D29595552 Pulled By: rohan-varma fbshipit-source-id: fee18204775211747337985c472ab1084a71f2f1	2021-07-08 12:28:06 -07:00
Jane Xu	9310f6bac1	Use our own statically stored vs_buildtools.exe (#61372 ) Summary: We might be getting limited for our VS install requests, leading to HUD failures. This PR moves it to curl from our own S3, so we wouldn't get limited. This PR also upgrades our vs_install to 16.8.6 from 16.8.5 as moving to S3 didn't help, but moving to the newer installer did. The CI passes the VS install now, but fails on a build error that I don't think is relevant: https://github.com/pytorch/pytorch/pull/61372/checks?check_run_id=3013140957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61372 Reviewed By: iramazanli Differential Revision: D29597204 Pulled By: janeyx99 fbshipit-source-id: 3eb52da308451271ea80120bbf2e511fb781b5dc	2021-07-08 11:27:02 -07:00
jjsjann123	ac5b910600	clang-tidy patch (#60714 ) Summary: Two changes made here: 1. Set `LANG=C.UTF-8` for clang-tidy so we can properly decode symbols in comment; 2. In case of file removed, `end` could be null and we should skip the chunk/file; 3. tiny bug fix for the loop indent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60714 Reviewed By: iramazanli Differential Revision: D29617171 Pulled By: 1ntEgr8 fbshipit-source-id: b1603929333529a174105baf51e18246d09c012e	2021-07-08 11:16:00 -07:00
David Riazati	074c776011	Force mypy colors in CI (#61391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61391 Both the [GitHub Actions log viewer](https://github.community/t/ansi-color-output-in-webview/17621) and the HUD PR page log viewer support ANSI color codes so turn those on via this [secret env variable](https://github.com/python/mypy/issues/7771) Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29602686 Pulled By: driazati fbshipit-source-id: e8f4cd71572cc068927e6719534e64773cb16c7f	2021-07-08 11:08:38 -07:00
Paul Johnson	c76eba650a	[bootcamp][pytorch][WIP] Support embedding_bag_byte_rowwise_offsets in cuda (#61075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61075 Completed implementation of the embedding_bag_byte_rowwise_offsets wrote randomized test comparing GPU and CPU kernel outputs. Test Plan: ``` buck build mode/opt --show-full-output //caffe2/torch/fb/sparsenn:gpu_test /data/users/johnsonpaul/fbsource/fbcode/buck-out/gen/caffe2/torch/fb/sparsenn/gpu_test#binary.par -r test_embedding_bag_byte_rowwise_offsets ``` Reviewed By: hyuen Differential Revision: D29218597 fbshipit-source-id: 786260466ab4e8e3d89540496bd8a38be14c5c1b	2021-07-08 10:51:50 -07:00
Pavithran Ramachandran	9ef1c64907	[PyTorch][Edge] Tests for QuantizationFx API on lite modules (#60476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60476 # Context Add tests for Lite modules that are quantized using fx API Read this posts for details about why we need a test bench for quantized lite modules https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/ https://github.com/pytorch/pytorch/pull/60226#discussion_r654615851 moved common code to `caffe2/torch/testing/_internal/common_quantization.py` ghstack-source-id: 133144292 Test Plan: ``` ~/fbsource/fbcode] buck test caffe2/test:fx_quantization_lite Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss Building: finished in 8.3 sec (100%) 11892/11892 jobs, 2 updated Total time: 8.6 sec More details at https://www.internalfb.com/intern/buck/build/ffb7d517-d85e-4c8f-9531-5e5d9ca1d34c Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: d79a5713-bd29-4bbf-ae76-33a413869a09 Trace available for this run at /tmp/tpx-20210630-105547.675980/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707 ✓ ListingSuccess: caffe2/test:fx_quantization_lite - main (9.423) ✓ Pass: caffe2/test:fx_quantization_lite - test_embedding (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (10.630) ✓ Pass: caffe2/test:fx_quantization_lite - test_submodule (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.464) ✓ Pass: caffe2/test:fx_quantization_lite - test_conv2d (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.728) Summary Pass: 3 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707 ``` Reviewed By: iseeyuan Differential Revision: D29306402 fbshipit-source-id: aa481e0f696b7e9b04b9dcc6516e8a390f7dc1be	2021-07-08 10:40:08 -07:00
Xiao Wang	179b3ab88c	[cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129 ) Summary: This PR enables the use of cuDNN BatchNorm spatial persistent algorithm for BatchNorm3d (5-D tensor) in channels_last_3d format, aka NDHWC. Performance and numerical accuracy are tested. - [x] Performance check for common shapes. - [x] Numerical accuracy check for (1 million) random shapes https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/A100 https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/V100 - [ ] Convergence check for common 3D models Pull Request resolved: https://github.com/pytorch/pytorch/pull/59129 Reviewed By: mruberry Differential Revision: D29593309 Pulled By: ngimel fbshipit-source-id: 2caf282c6cf2f426aa14a24f94e6bddada68ddac	2021-07-07 21:28:29 -07:00
Pritam Damania	0222291544	Fix docs for ShardMetadata. (#61388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61388 The doc for `placement` argument was outdated and is now fixed. ghstack-source-id: 133184441 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D29601316 fbshipit-source-id: a0817f799382bf91a5192c54dfeea4d253eb0d56	2021-07-07 21:27:30 -07:00
Ivan Yashchuk	7011513d23	Enable sparse_csr.to_dense() for bool, float16, bfloat16 and complex (#60657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60657 Fixes https://github.com/pytorch/pytorch/issues/60648 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29408102 Pulled By: cpuhrsch fbshipit-source-id: 406505c1c52c0eada934833f9723f58fa67e9256	2021-07-07 19:29:19 -07:00
Brian Hirsh	5054cb8934	fix torch.cat bug with boxed CPUFallback (#60993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60993 Fixes https://github.com/pytorch/pytorch/issues/60902 The boxed fallback was written to assume that there was at least one tensor argument, which it used to figure out what device to move the cpu tensors to. That fails with an op like `torch.cat()`, which doesn't have any tensor arguments, but instead has a single `TensorList` argument. I also added handling to gracefully deal with the case where you have an empty list of tensors - in that case we don't know what device to move everything to, but that doesn't matter because an empty list of tensors implies that we have no tensors to move anyway. I tested it out though and noticed that `torch.cat(())` doesn't handle empty lists well anyway (erroring out in the dispatcher). I'm not sure that it's a huge issue, and not even sure that we want to fix it (default to CPU? add an extra codegen'd check into every op that only takes TensorList args?) but I'll file a separate bug for that: https://github.com/pytorch/pytorch/issues/60997 I tested it by running the pytorch/xla suite after removing `cat` from `xla_native_functions.yaml`, and confirming that we don't segfault anymore. Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29471577 Pulled By: bdhirsh fbshipit-source-id: 58c96e8d48d993785b8d15cfa846ec745a34e623	2021-07-07 19:29:17 -07:00
Tao Xu	141bfbef86	[iOS GPU] Add tanh and clamp to support GAN (#61383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61383 Since we already have the support for hardtanh, it's easy to add support for clamp. GPU is 40% ish faster. ghstack-source-id: 133113272 Test Plan: - CI - buck test pp-macos Reviewed By: dhruvbird Differential Revision: D29572933 fbshipit-source-id: d22ec09e18d02456440f552067c9a8aea9a1a8ab	2021-07-07 19:29:16 -07:00
Richard Zou	4937d9fd6f	Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#60787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60787 Fixes #60461. Previously, when one calls `self.index(indices)` using a regular `self` Tensor and a `BatchedTensor` indices the dispatcher would not dispatch to the Batched key. This is because the dispatcher did not extract dispatch keys from `indices`. Similar #58283 and #58296, this PR modifies the dispatcher to extract dispatch keys from List[Optional[Tensor]] arguments. We do this for both boxed and unboxed kernels. Test Plan: - run the test case in https://gist.github.com/zou3519/4421df7c5271376a0ef53ca857b18740 (requires functorch). After this PR, it raises `RuntimeError: Batching rule not implemented for aten::index.Tensor. We could not generate a fallback.`, which shows that dispatch happened on the Batched key. - Taking suggestions for how to write a test for this in core Reviewed By: jbschlosser Differential Revision: D29438611 Pulled By: zou3519 fbshipit-source-id: 77e182f763e18aa3fa857eebafa8b7f83384db71	2021-07-07 19:28:07 -07:00
Michael Suo	426c42ba45	[package] ensure we don't write files twice to the archive. (#61371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61371 The ZIP format allows for writing multiple files with the same name. But this is handled poorly by most tooling (including our own), so doing so produces weird behavior depending on the implementation of the ZIP reader. Since we have no valid use case for writing multiple files with the same name to a `torch.package`, just ban it. Differential Revision: D29595518 D29595518 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: b9f5263ab47572abde233745c102af3d6143946e	2021-07-07 18:28:42 -07:00
Pritam Damania	1d1d5acbb0	[RPC] Ensure _wait_all_workers doesn't swallow exception. (#61094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61094 `_wait_all_workers` was swallowing exceptions and as a result if there were any errors it would still continue with rpc_agent.join() which would hang since something already failed before. To fix this, I've ensured that wait_all_workers throws and in that case we just proceed with an ungraceful shutdown without joining. ghstack-source-id: 133160706 Test Plan: 1) Added unit test. 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D29509286 fbshipit-source-id: 7c3f1c68d712ae2f63e10e0216580db8e9bcc29d	2021-07-07 18:28:41 -07:00
Ivan Kobzarev	7b6ddb6793	[nnapi] add log_softmax (#61378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378 Test Plan: Imported from OSS Reviewed By: axitkhurana Differential Revision: D29597355 Pulled By: IvanKobzarev fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1	2021-07-07 18:28:39 -07:00
Richard Barnes	eb82a88d85	Add a type for test fixture world_size (#61363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61363 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29561360 fbshipit-source-id: 821217e33adc483b1810585a2b91a2ee416513bd	2021-07-07 18:27:37 -07:00
Charles David Hernandez	d51b437b74	Cuda quantized tensors, support for quantize per channel (#58245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58245 This adds the support for the per_channel quantization, (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensors python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29018271 fbshipit-source-id: 4f59aed98f2f8ff607154250e4e3f85592e17854	2021-07-07 17:36:53 -07:00
Jeffrey Wan	b1dc9c3946	Skip _cudnn_rnn_backward in codegen check (#61386 ) Summary: Fixes internal test failure encountered internally For context see: https://github.com/pytorch/pytorch/issues/60426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61386 Reviewed By: malfet Differential Revision: D29601031 Pulled By: soulitzer fbshipit-source-id: 3592ca45a01e7bbaa804ab5404338191154f0fbc	2021-07-07 17:36:51 -07:00
Rong Rong (AI Infra)	b25c65b4f3	Revert D29589020: [pytorch][PR] adding a build_start_time_epoch to build meta info Test Plan: revert-hammer Differential Revision: D29589020 (`d33066ab3f`) Original commit changeset: 309fc3b01cbc fbshipit-source-id: 9b50c1e8dd63e59ab4e593d250dfd5eeb623f0af	2021-07-07 17:35:29 -07:00
Ivan Yashchuk	9dd1824741	Fix dispatch keys for eigh, lu_solve (#60945 ) Summary: I added a test to `test_ops.py` that verifies that the op can run correctly from different cuda devices. This test revealed that `linalg_eigh`, `linalg_eigvalsh`, `linalg_matrix_rank`, `linalg_pinv` were failing. `matrix_rank` and `pinv` are calling `eigh` internally. `linalg_eigh` and `lu_solve` internally use dispatch stubs, so they should be registered with `CPU, CUDA` dispatch keys. The generated code includes device guards in this case and the problem is not present. Implemented a better out variant for `eigvalsh` and registered it with `CPU, CUDA` dispatch keys. ~I added a device guard to `linalg_eigh_kernel` as a fix for `eigvalsh` function. This function needs to be registered as CompositeImplicitAutograd, because it calls `at::linalg_eigh` if `at::GradMode::is_enabled()`.~ Fixes https://github.com/pytorch/pytorch/issues/60892. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60945 Reviewed By: mruberry Differential Revision: D29589580 Pulled By: ngimel fbshipit-source-id: 5851605958bdfc3a1a1768263934619449957168	2021-07-07 16:28:22 -07:00
Jane Xu	fb00194030	Fix typo in common_utils.py (#61365 ) Summary: Missed this in review of https://github.com/pytorch/pytorch/pull/57953. I don't think this has affected much, though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61365 Reviewed By: walterddr Differential Revision: D29593764 Pulled By: janeyx99 fbshipit-source-id: 2c6f6aa961eabca0d8b8a7607aaae979667cca3b	2021-07-07 16:28:20 -07:00
zhouzhuojie	6107cf3750	Add --jobs 0 for git submodule update (#61311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61152 Some related docs about `submodule.fetchJobs` https://git-scm.com/docs/git-config#Documentation/git-config.txt-submodulefetchJobs ``` time git submodule update --init --recursive ________________________________________________________ Executed in 243.20 secs fish external usr time 49.64 secs 213.00 micros 49.64 secs sys time 29.27 secs 795.00 micros 29.27 secs ``` ``` time git submodule update --init --recursive --jobs 4 ________________________________________________________ Executed in 143.04 secs fish external usr time 51.06 secs 246.00 micros 51.06 secs sys time 30.96 secs 742.00 micros 30.96 secs ``` ``` time git submodule update --init --recursive --jobs 8 ________________________________________________________ Executed in 124.64 secs fish external usr time 51.76 secs 264.00 micros 51.76 secs sys time 30.49 secs 739.00 micros 30.49 secs ``` ``` time git submodule update --init --recursive --jobs 0 # use all online cpus ________________________________________________________ Executed in 129.75 secs fish external usr time 51.64 secs 181.00 micros 51.64 secs sys time 31.49 secs 781.00 micros 31.49 secs ``` Test Plan: Imported from OSS Reviewed By: 1ntEgr8 Differential Revision: D29560875 Pulled By: zhouzhuojie fbshipit-source-id: 556027dffe744c66428075a8a1bf64683930aaaf	2021-07-07 16:28:18 -07:00
Rong Rong	d33066ab3f	adding a build_start_time_epoch to build meta info (#61322 ) Summary: Adding a `build_start_time_epoch` as a normal field in scribe reporting. This should fix https://github.com/pytorch/pytorch/issues/60591. The decision was made because: - we would like only one build (test CI job) start time as partition key string - the alternative is to report the duration on each test case individually which would result in duplicate numeric value upload. - we would be easily calculate the wall-time of a test job from `MAX('time') - build_start_time_epoch` for all reporting messages with the same normal keys. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61322 Test Plan: CI should report the extra normal field. See: https://fburl.com/scuba/pytorch_test_times/pm6chz9w Reviewed By: driazati Differential Revision: D29589020 Pulled By: walterddr fbshipit-source-id: 309fc3b01cbce76cd62f8ccd2eb0ecad27782b88	2021-07-07 16:27:13 -07:00
Peter Bell	429436edbd	Avoid complex-to-real cast warning in CopyBackward (#60021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60021 Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29589371 Pulled By: mruberry fbshipit-source-id: 73e1511cae69207dc9abe576e2769ee1d03f1bbd	2021-07-07 15:28:38 -07:00
Peter Bell	10b2a24508	Migrate log_sigmoid (forward and backward) to ATen (CUDA) (#60881 ) Summary: Fixes gh-24591, fixes gh-24590, closes gh-39642 Benchmarks were run with nvprof using contiguous inputs; they show improvement across the board. #### Forward benchmarks \| Num Elements \| Master (us) \| This PR (us) \| \|:------------:\|:-----------:\|:------------:\| \| 10^4 \| 2.5840 \| 2.5230 \| \| 10^5 \| 4.6410 \| 3.9280 \| \| 10^6 \| 33.772 \| 23.025 \| \| 10^7 \| 299.67 \| 206.35 \| \| 10^8 \| 3001.9 \| 2052.8 \| #### Backward benchmarks \| Num Elements \| Master (us) \| This PR (us) \| \|:------------:\|:-----------:\|:------------:\| \| 10^4 \| 2.7750 \| 2.7080 \| \| 10^5 \| 5.2430 \| 3.9010 \| \| 10^6 \| 46.198 \| 32.878 \| \| 10^7 \| 447.18 \| 296.18 \| \| 10^8 \| 4393.2 \| 2938.0 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/60881 Reviewed By: mruberry Differential Revision: D29589455 Pulled By: ngimel fbshipit-source-id: 70cd5db244bf6292e9ca367462640530a1d85f7d	2021-07-07 15:28:36 -07:00
David Riazati	f86460a352	Add coverage files to .gitignore (#61144 ) Summary: Fixes failures when coverage is turned on: https://github.com/pytorch/pytorch/runs/2966295169 https://github.com/pytorch/pytorch/runs/2964409741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61144 Test Plan: ```bash $ echo hi > test/.coverage.jit.1625168654.4504092 $ git status $ ``` Reviewed By: zhouzhuojie Differential Revision: D29530709 Pulled By: driazati fbshipit-source-id: 0e6a1cb217c4d48f14c0c58a546f98393d2b0392	2021-07-07 15:28:35 -07:00
Karen Zhou	5e83fefdf8	[sparsity] sparsifier `step` tests (#60107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60107 Unit tests for sparsifier `step` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhQP Reviewed By: z-a-f Differential Revision: D29167029 fbshipit-source-id: 053027ca92701097406372ef0f81d79ef28380aa	2021-07-07 15:28:33 -07:00
Karen Zhou	8881b9d852	[sparsity] sparsifier `convert` tests (#60105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60105 Unit tests for sparsifier `convert` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhQ8 Reviewed By: z-a-f Differential Revision: D29145450 fbshipit-source-id: b87b8f0d44751a7dae19d454a11b2d207a7286e2	2021-07-07 15:28:31 -07:00
Karen Zhou	ec200a60bd	[sparsity] sparsifier `prepare` tests (#60042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60042 Unit tests for sparsifier `prepare` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhR1 Reviewed By: z-a-f Differential Revision: D29140945 fbshipit-source-id: 73cbf27f278ce849e3930ba6caa82bb2f64f1321	2021-07-07 15:28:30 -07:00
Karen Zhou	21ad978d4f	[sparsity] rename `sparsity_pattern` to `sparse_block_shape` (#59898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59898 In `weight_norm_sparsifier`, the name of the argument `sparsity_pattern` is not intuitive for an argument describing the shape of the sparse block. It has been changed to `sparse_block_shape`. Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhRM Reviewed By: z-a-f Differential Revision: D29077045 fbshipit-source-id: 0cf9c5387d41ca8e839ee050d71f4fe477374143	2021-07-07 15:27:16 -07:00
Hui Guo	aa6a8a6d21	[nnc] Add LoopNest::unsafe_fuseLoops to let users apply fusion on stmts that may violate our correctness checks (#60601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60601 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29346128 Pulled By: huiguoo fbshipit-source-id: 0eb143e97dc57224adeedf99981036ad836e5a03	2021-07-07 14:27:18 -07:00
Ramgopal Venkateswaran	8fd90f7cfd	Implementing transpose for PackedTensorAccessor (#61114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61114 Matching the functionality of THCDeviceTensor::transpose. This is the same as PR 60968 (https://github.com/pytorch/pytorch/pull/60968) which was already approved; the state of the PR got messed up so creating a fresh one. ghstack-source-id: 133050553 Test Plan: Unit tests at aten/src/ATen/test/packedtensoraccessor_test.cpp Imported from OSS Reviewed By: ezyang Differential Revision: D29516530 fbshipit-source-id: 91d5bcc38381c00420825646b1c352c0d6bc8b3f	2021-07-07 14:27:16 -07:00
Zeina Migeed	39a76fe73c	BatchNorm2D (#61012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61012 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29562337 Pulled By: migeed-z fbshipit-source-id: 2b848d0af607bd4f36cea2436ab2278ac4bc28d7	2021-07-07 14:26:07 -07:00
Jiewen Tan	357c4d9cc4	Add a test case for findDanglingImpls (#61104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104 This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information. Test Plan: python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext Imported from OSS Reviewed By: ezyang Differential Revision: D29512520 fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b	2021-07-07 13:34:16 -07:00
Akifumi Imanishi	4d9fd8958b	Support `__rand__`, `__ror__` and `__rxor__` (#59240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58120. This PR implements `torch.Tensor.{__rand__/__ror__/__rxor__}` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59240 Reviewed By: ngimel Differential Revision: D29482304 Pulled By: mruberry fbshipit-source-id: 13789202c1d8dddf8658a45381aeedcc31e2f603	2021-07-07 13:34:14 -07:00
Nikita Shulga	9547e57643	Create SECURITY.md (#61356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61356 Reviewed By: samestep Differential Revision: D29589904 Pulled By: malfet fbshipit-source-id: 5d79d25e35af9cb258fd6843559955360dc0cc4e	2021-07-07 13:34:12 -07:00
Serhat Yilmaz	f84a441718	[torch][segment_reduce] Update default values when initial value is not set (#61266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61266 Same as title. Mainly this concludes the initially planned features from the op. Only missing functionality is to do reduction on any axis (currently axis 0 only is supported). Test Plan: Updated unit test. Reviewed By: ngimel Differential Revision: D29552037 fbshipit-source-id: 023c7cbf750a0671f76082708f14c05739dda07a	2021-07-07 13:34:10 -07:00
Serhat Yilmaz	a78ad5dc4c	[torch][segment_reduce] Add support for int lengths as well (#61141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61141 Currently only long is supported. This diff adds support for other index type. Next Steps: - Update default, refactor unit test and test non_initial value as well - Cleanup (more tests, benchmark, update documentation) Test Plan: updated unit test. rely on CI. Reviewed By: ngimel Differential Revision: D29526308 fbshipit-source-id: b4043603483851ef7e0e93b0bb02ac7849c6449d	2021-07-07 13:34:09 -07:00
Kushashwa Ravi Shrimali	423523d8bb	Alias for logsumexp to special namespace (#58838 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: kshitij12345 Lezcano mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58838 Reviewed By: malfet Differential Revision: D29565033 Pulled By: mruberry fbshipit-source-id: 9b715ea00c78f47b6f183357ee3c7d4c3abe4d01	2021-07-07 13:32:15 -07:00
Sam Estep	c03f99f3ef	Remove pyproject.toml (#61367 ) Summary: This reverts https://github.com/pytorch/pytorch/issues/60408, since it doesn't really give much benefit, and it ended up breaking things: - https://github.com/pytorch/pytorch/issues/60665 - https://github.com/pytorch/pytorch/pull/60408#issuecomment-873979383 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61367 Reviewed By: malfet, janeyx99 Differential Revision: D29593886 Pulled By: samestep fbshipit-source-id: b1ba0ac7695e3eacf66a35e293080e8a1240efca	2021-07-07 12:47:45 -07:00
Charles David Hernandez	994ce7dbd9	Cuda quantized tensors, support for quantize per tensor (#59700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59700 implements quantized tensors in cuda for for per_tensor quantization, along with several necessary functions (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensors python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29018272 fbshipit-source-id: e07d19d6d67729c46324c2bb5946d959e6e6db8e	2021-07-07 12:40:51 -07:00
Akshit Khurana	baa518e2f6	Add Int32 support for NNAPI (#59365 ) Summary: Support Int32 tensors in NNAPI converter Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365 Test Plan: Local testing with FB prod models Reviewed By: anshuljain1 Differential Revision: D28881040 fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d	2021-07-07 12:40:49 -07:00
Akshit Khurana	cf285d8eea	Add aten::slice NNAPI converter (#59364 ) Summary: Add support for aten::slice op in the NNAPI model converter * If start = 0; end = max -> identity * Flexible shapes can be passed through * Flexible shapes can't be sliced over Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice Reviewed By: anshuljain1 Differential Revision: D28881039 fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e	2021-07-07 12:40:47 -07:00
Akshit Khurana	d26372794a	Add aten::detach NNAPI converter (#58543 ) Summary: * Add support for aten::detach op in the NNAPI model converter as a no-op * Also add flexible op support for add_pointwise_simple_unary_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch Reviewed By: anshuljain1 Differential Revision: D28531942 fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d	2021-07-07 12:40:46 -07:00
Akshit Khurana	0be228dd5f	Add aten::flatten NNAPI converter (#60885 ) Summary: Add support for aten::div op in the NNAPI model converter. Startup time variable size support isn't supported as shapes go as inputs to NNAPI op Runtime variable size support to supported soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29451725 fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4	2021-07-07 12:40:44 -07:00
Akshit Khurana	b297f65b66	Add aten::div NNAPI converter (#58541 ) Summary: Add support for aten::div op in the NNAPI model converter. Add variable size input test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div Reviewed By: anshuljain1 Differential Revision: D28531943 fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c	2021-07-07 12:40:42 -07:00
Akshit Khurana	eab18a9a40	Add aten::to NNAPI converter (#58540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540 Add support for aten::to op in the NNAPI model converter for simple cases like to("cpu"), to("gpu") Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to Reviewed By: anshuljain1 Differential Revision: D28531941 fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3	2021-07-07 12:40:41 -07:00
Akshit Khurana	14d604a13e	Add aten::softmax NNAPI converter (#58539 ) Summary: Add support for aten::softmax op in the NNAPI model converter with flexible size Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax Reviewed By: anshuljain1 Differential Revision: D28531946 fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8	2021-07-07 12:39:31 -07:00
Xue Haotian	45ce26c397	Port `isposinf` & `isneginf` kernel to structured kernels (#60633 ) Summary: Porting `torch.isposinf` & `torch.isneginf` to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60633 Reviewed By: saketh-are Differential Revision: D29517528 Pulled By: bdhirsh fbshipit-source-id: f8f62e4c203e0c54790437b5e512024bfabdddfc	2021-07-07 12:33:41 -07:00
Don Jang	c2b0af2560	[static runtime] Implement aten::sign (#61154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154 Test Plan: Added `StaticRuntime.IndividualOps_Sign` ``` [djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 ... [ RUN ] StaticRuntime.IndividualOps_Sign V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1) V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29518603 fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a	2021-07-07 12:29:25 -07:00
Philip Meier	1262b2c4c6	fix `torch.futures` docstring examples (#61029 ) Summary: Trying to run the doctests for the complete documentation hangs if it reaches the examples of `torch.futures`. It turns out to be only syntax errors, which are normally just reported. My guess is that `doctest` probably doesn't work well for failures within async stuff. Anyway, while debugging this, I fixed the syntax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61029 Reviewed By: mruberry Differential Revision: D29571923 Pulled By: mrshenli fbshipit-source-id: bb8112be5302c6ec43151590b438b195a8f30a06	2021-07-07 11:47:55 -07:00
Kento Nozawa	376dc500a9	Minor bug fix in the warning message (#61127 ) Summary: The current example code does not work. The correct one is like this: `cb7d813275/torch/distributed/run.py (L266)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61127 Reviewed By: cbalioglu Differential Revision: D29572003 Pulled By: mrshenli fbshipit-source-id: 05b470230f3d70f8a6164edb5f92894a1112069f	2021-07-07 11:42:51 -07:00
Facebook Community Bot	90241d254f	Automated submodule update: FBGEMM (#59968 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a2257d9471` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59968 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: r-barnes Differential Revision: D29109045 fbshipit-source-id: 386b28b28275e728ee229d4baf1ff192635d49c3	2021-07-07 11:33:57 -07:00
Philip Meier	29ecb9f90b	Don't check stride by default (#60637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60637 We now have ~three out of three~ four out of four datapoints that `check_stride` will be `partial`'ed to `False`: - `torch` test suite: https://github.com/pytorch/pytorch/pull/58981#discussion_r639514081 - `torchvision` test suite: https://github.com/pytorch/pytorch/issues/56544#issuecomment-845352605 - `kornia`: `9041c42b41/test/utils.py (L25)` - `torch.fft`: https://github.com/pytorch/pytorch/pull/60304#pullrequestreview-687882323 Given that the strides in most cases are in implementation detail, IMO we should change the default to `False`. In cases were matching strides is a requirement for closeness / equality it can always set to `True` manually. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556355 Pulled By: mruberry fbshipit-source-id: 0029a44280d8f4369fbdb537dce3202eeee4b1d9	2021-07-07 09:55:36 -07:00
Philip Meier	e2a3f4b560	Use maximum of tolerances in case of mismatching dtypes (#60636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60636 See https://github.com/pytorch/pytorch/pull/58981#issuecomment-866654600. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556352 Pulled By: mruberry fbshipit-source-id: 36e97e0f338df5d17a94af078f172c668ef51ecb	2021-07-07 09:55:34 -07:00
Philip Meier	5f18ba7075	upcast to most precise dtype within their category before the comparison (#60536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60536 `torch.isclose` does not do this bool tensors, which results in a test failure since subtraction (`abs(actual - expected)`) is not supported for them (see #58981). Since the `dtype` is already checked at this point, we can safely move the upcasting before `torch.isclose` is invoked. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556356 Pulled By: mruberry fbshipit-source-id: 4c65fad4f06cf402d6aab9dde5b127235766d5e0	2021-07-07 09:55:32 -07:00
Philip Meier	5ac87cde30	tests for diagnostics in callable `msg` in `torch.testing.assert_close` (#60254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60254 Before we only tested that the correct error message is returned if `msg` is passed as callable. This adds tests that make sure that - the inputs passed to the callable are the same inputs passed to `torch.assert_close` and - the `diagnostics` namespace has the same attributes and types as documented. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556354 Pulled By: mruberry fbshipit-source-id: 9793c6d86fda842b6329381fc03b945eee878464	2021-07-07 09:55:30 -07:00
Philip Meier	76d9e680d7	update docstring examples of `torch.testing.assert_close` (#60163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60163 Changes to the default error message in case of mismatching values need to be reflected in the examples given in the docstring. Normally this should be enforced by a [`doctest`](https://docs.python.org/3/library/doctest.html). mruberry do you know why we don't have such a check? Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556353 Pulled By: mruberry fbshipit-source-id: 8dbc3f566f429618811b542a059d9abde9a6530b	2021-07-07 09:55:29 -07:00
Philip Meier	9979289037	Improve error messages of `torch.testing.assert_close` in case of mismatching values (#60091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60091 Closes #58383. (1) and (2) are implemented. (3) was rejected. No consensus was reached on (4) and (5). Improvements: - Instead of calling everything "Tensors" we now use "Scalars" and "Tensor-likes" depending on the shape. Plus, we now internally have the option to adapt this identifier for example to report "Imaginary components of complex tensor-likes", which is even more expressive. - The reported conditions "not close" and "not equal" are now determined based on `rtol` and `atol`. - The number of mismatched elements and the offending indices are only reported in case the inputs are not scalar - The allowed `rtol` and `atol` is only reported if `> 0` Example 1 ```python torch.testing.assert_close(1, 3, rtol=0, atol=1) ``` Before: ``` AssertionError: Tensors are not close! Mismatched elements: 1 / 1 (100.0%) Greatest absolute difference: 2 at 0 (up to 1 allowed) Greatest relative difference: 0.6666666865348816 at 0 (up to 0 allowed) ``` After: ``` AssertionError: Scalars are not close! Absolute difference: 2 (up to 1 allowed) Relative difference: 0.6666666865348816 ``` Example 2 ```python torch.manual_seed(0) t = torch.rand((2, 2), dtype=torch.complex64) torch.testing.assert_close(t, t + complex(0, 1)) ``` Before: ``` AssertionError: Tensors are not close! Mismatched elements: 4 / 4 (100.0%) Greatest absolute difference: 1.0000000596046448 at (0, 0) (up to 1e-05 allowed) Greatest relative difference: 0.8833684352411922 at (0, 1) (up to 1.3e-06 allowed) The failure occurred for the imaginary part. ``` After: ``` AssertionError: Imaginary components of tensor-likes are not close! Mismatched elements: 4 / 4 (100.0%) Greatest absolute difference: 1.0000000596046448 at index (0, 0) (up to 1e-05 allowed) Greatest relative difference: 0.8833684352411922 at index (0, 1) (up to 1.3e-06 allowed) ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556357 Pulled By: mruberry fbshipit-source-id: 559d4a19ad4fc069b2b4f8cb5fc2f6058621e33d	2021-07-07 09:54:09 -07:00
Sameer Deshmukh	e1338016dd	cuSOLVER path for LU factorization in CUDA. (#56887 ) Summary: This PR adds cuSOLVER path for `torch.lu`. Performance comparison results: https://github.com/pytorch/pytorch/issues/53879#issuecomment-830635381 Code for reproducing performance results: https://github.com/pytorch/pytorch/pull/56887#issuecomment-843212868 The following heuristics are used for choosing cuSOLVER over MAGMA: * If batch size == 1 OR (batch size <= 8 AND shape <= 16), choose cuSOLVER over MAGMA. * For all other cases use MAGMA. See also https://github.com/pytorch/pytorch/issues/47953. Following are the performance results between the MASTER branch and the current changes: <details> ``` [-------------------------- LU factorization (ATen) torch.float64 ---------------------------] \| lu_factorize CURRENT \| lu_factorize MASTER 1 threads: ----------------------------------------------------------------------------------- torch.Size([1, 1, 1]) \| 363.9 \| 284.1 torch.Size([2, 1, 1]) \| 354.8 \| 271.8 torch.Size([4, 1, 1]) \| 393.7 \| 278.0 torch.Size([8, 1, 1]) \| 459.3 \| 279.1 torch.Size([16, 1, 1]) \| 524.2 \| 288.9 torch.Size([32, 1, 1]) \| 525.1 \| 281.2 torch.Size([64, 1, 1]) \| 524.5 \| 281.7 torch.Size([128, 1, 1]) \| 522.8 \| 285.2 torch.Size([1, 2, 2]) \| 360.4 \| 277.7 torch.Size([2, 2, 2]) \| 372.9 \| 279.2 torch.Size([4, 2, 2]) \| 419.4 \| 278.3 torch.Size([8, 2, 2]) \| 475.7 \| 279.2 torch.Size([16, 2, 2]) \| 530.0 \| 299.5 torch.Size([32, 2, 2]) \| 530.0 \| 294.5 torch.Size([64, 2, 2]) \| 531.0 \| 291.5 torch.Size([128, 2, 2]) \| 544.4 \| 292.3 torch.Size([1, 8, 8]) \| 372.6 \| 292.8 torch.Size([2, 8, 8]) \| 380.9 \| 296.2 torch.Size([4, 8, 8]) \| 420.0 \| 293.4 torch.Size([8, 8, 8]) \| 490.6 \| 294.6 torch.Size([16, 8, 8]) \| 535.6 \| 296.5 torch.Size([32, 8, 8]) \| 534.7 \| 302.1 torch.Size([64, 8, 8]) \| 539.1 \| 305.5 torch.Size([128, 8, 8]) \| 540.7 \| 296.5 torch.Size([1, 16, 16]) \| 345.0 \| 303.2 torch.Size([2, 16, 16]) \| 405.0 \| 306.3 torch.Size([4, 16, 16]) \| 482.8 \| 305.6 torch.Size([8, 16, 16]) \| 596.3 \| 305.9 torch.Size([16, 16, 16]) \| 539.6 \| 304.4 torch.Size([32, 16, 16]) \| 542.2 \| 305.8 torch.Size([64, 16, 16]) \| 556.1 \| 311.0 torch.Size([128, 16, 16]) \| 545.1 \| 308.1 torch.Size([1, 32, 32]) \| 432.7 \| 342.4 torch.Size([2, 32, 32]) \| 582.6 \| 341.8 torch.Size([4, 32, 32]) \| 580.4 \| 344.4 torch.Size([8, 32, 32]) \| 586.5 \| 343.8 torch.Size([16, 32, 32]) \| 582.9 \| 346.0 torch.Size([32, 32, 32]) \| 574.4 \| 343.7 torch.Size([64, 32, 32]) \| 562.8 \| 350.8 torch.Size([128, 32, 32]) \| 568.3 \| 349.8 torch.Size([1, 64, 64]) \| 537.1 \| 518.4 torch.Size([2, 64, 64]) \| 766.5 \| 539.1 torch.Size([4, 64, 64]) \| 771.6 \| 551.9 torch.Size([8, 64, 64]) \| 783.4 \| 556.0 torch.Size([16, 64, 64]) \| 798.8 \| 555.3 torch.Size([32, 64, 64]) \| 795.6 \| 548.6 torch.Size([64, 64, 64]) \| 804.2 \| 580.4 torch.Size([128, 64, 64]) \| 837.6 \| 616.9 torch.Size([1, 128, 128]) \| 844.7 \| 848.9 torch.Size([2, 128, 128]) \| 1096.7 \| 873.3 torch.Size([4, 128, 128]) \| 1117.9 \| 884.8 torch.Size([8, 128, 128]) \| 1138.1 \| 903.6 torch.Size([16, 128, 128]) \| 1169.1 \| 943.9 torch.Size([32, 128, 128]) \| 1204.8 \| 981.4 torch.Size([64, 128, 128]) \| 1336.6 \| 1105.8 torch.Size([128, 128, 128]) \| 1639.4 \| 1473.3 torch.Size([1, 512, 512]) \| 3714.3 \| 3928.6 torch.Size([2, 512, 512]) \| 4388.3 \| 4179.7 torch.Size([4, 512, 512]) \| 4765.4 \| 4536.9 torch.Size([8, 512, 512]) \| 5615.2 \| 5441.1 torch.Size([16, 512, 512]) \| 7203.6 \| 7130.2 torch.Size([32, 512, 512]) \| 10580.5 \| 10503.9 torch.Size([64, 512, 512]) \| 17374.8 \| 17349.6 torch.Size([128, 512, 512]) \| 32542.3 \| 32548.8 torch.Size([1, 1024, 1024]) \| 10041.5 \| 14292.3 torch.Size([2, 1024, 1024]) \| 17126.6 \| 16971.0 torch.Size([4, 1024, 1024]) \| 20591.0 \| 20490.8 torch.Size([8, 1024, 1024]) \| 27682.8 \| 27560.7 torch.Size([16, 1024, 1024]) \| 41035.2 \| 41035.8 torch.Size([32, 1024, 1024]) \| 67091.8 \| 67345.9 torch.Size([64, 1024, 1024]) \| 119612.3 \| 119782.3 torch.Size([128, 1024, 1024]) \| 230095.5 \| 230766.2 Times are in microseconds (us). ``` </details> The main reason why a performance regression can be seen is related to this issue (https://github.com/pytorch/pytorch/issues/55122) and there seems to be no easy way to fix this (atleast in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56887 Reviewed By: ngimel Differential Revision: D29482342 Pulled By: mruberry fbshipit-source-id: 4fdedf21b0d5597b289e168dff61d5f5d7727fb1	2021-07-07 09:45:23 -07:00
ramvenkat98	4a544df00d	Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29525213 Pulled By: ramvenkat98 fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a	2021-07-07 08:08:32 -07:00
mingfeima	8bec478a9e	MaxPool2d: use channels_last format for both output and indice when input is channels_last (#61245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61245 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29557884 Pulled By: ezyang fbshipit-source-id: 0d2b8cbaaf13411eefd7d867021bd6028d40e5cc	2021-07-07 07:50:28 -07:00
xiaolil1	66158a6e90	Enable AutogradXPU DispatchKey for Intel heterogeneous computation platform. (#61105 ) Summary: Add string wrapper for AutogradXPU to enable this DispatchKey. We are going to use AutogradXPU as custom autograd backend, which needs this DispatchKey. This sting wrapper is used to map AutogradXPU to the corresponding DispatchKey. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61105 Reviewed By: malfet Differential Revision: D29557697 Pulled By: ezyang fbshipit-source-id: f0c8155decc8e2fd90741650a05de5a8b5a70121	2021-07-07 07:47:01 -07:00
Freey0	a69e947ffd	avg_pool3d_backward: Port to structured (#59084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59084 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28802619 Pulled By: ezyang fbshipit-source-id: 89a0fcdcf8976ca7c21da7a40fd26a1cba180faa	2021-07-07 07:44:17 -07:00
haozhe.zhu	e4c450a4e8	The dispatch order for custom function (#60251 ) Summary: Hi, I am working on dev some custom ops. And I found this issue: Cause of the logical here: https://github.com/pytorch/pytorch/compare/master...zhuhaozhe:customer-op-trace?expand=1#diff-d7ade8589773904745c0cf965a19f24c940f1d36038f4c0ce85af2f3d89991dcL173-L177. For all custom ops, "Tracer" dispatch key got the highest priority. This make custom-ops and non-custom-ops get different behavior during dispatch. I do not understand whether there exist some special reason to let custom-ops "trace" first then begin to "dispatch". Pull Request resolved: https://github.com/pytorch/pytorch/pull/60251 Reviewed By: malfet Differential Revision: D29577131 Pulled By: ezyang fbshipit-source-id: a8e824029cf934f09f29638b127961a6a5c332de	2021-07-07 06:31:43 -07:00
Jeffrey Wan	a6fea03a8a	Skip codegen checks for `dequantize_self`, `lu_unpack`, `_cudnn_rnn`, and `.conv._backward.*` (#61139 ) Summary: Temporary fix for fb-internal tests. This and similar failures are being discussed here: https://github.com/pytorch/pytorch/issues/60426 Applies the below changes: - This may seem counter intuitive because storage check comes before tensor check, but if TensorImpl use count is not enforced, we should also not enforce storage use count. If an op returns one of its inputs as-is, it is possible for this input to already be aliased with another tensor, and hence would have StorageImpl use count greater than one. - Also clarify in description that use_count is not necessarily > 1, use_count may but not necessarily return one of its inputs as-is. - Allow usage of regex in skip list Pull Request resolved: https://github.com/pytorch/pytorch/pull/61139 Reviewed By: malfet, Varal7 Differential Revision: D29564917 Pulled By: soulitzer fbshipit-source-id: 806b7177117a573dd12f161cc80dcadac892f9d0	2021-07-07 05:21:19 -07:00
Zeina Migeed	6f1455440b	task 3: typecheck (#60805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60805 Test Plan: Imported from OSS Reviewed By: jamesr66a, VitalyFedyunin Differential Revision: D29522885 Pulled By: migeed-z fbshipit-source-id: 559a8a495a16e517af77fd5a0785a82e1ebb3bd7	2021-07-06 23:51:49 -07:00
Nikita Shulga	9813b9bc0d	Fix mypy.ini (#61333 ) Summary: Fixes CI regression caused by https://github.com/pytorch/pytorch/issues/61119 Unlike Python, `.ini` string lists could not end with trailing comma. Fixes CI on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/61333 Reviewed By: bhosmer Differential Revision: D29578696 Pulled By: malfet fbshipit-source-id: b81e5f4c0a553299c4d4bee0a9bb73748910795f	2021-07-06 22:46:09 -07:00
Nikita Shulga	f0316ec0b6	Revert D24068202: [pytorch][PR] Add typing return value to init in nn.Module Test Plan: revert-hammer Differential Revision: D24068202 (`506397a809`) Original commit changeset: 4cd9b6ca12b5 fbshipit-source-id: f45fcf7ee6ee9198ed6f3f34956ce68a64378c32	2021-07-06 22:15:31 -07:00
Zeina Migeed	98119bfce9	task 2: ast rewrite (#60622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60622 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29493747 Pulled By: migeed-z fbshipit-source-id: 684fcdfd3dd441e72c77bb7a4d64c18b9849a198	2021-07-06 20:15:30 -07:00
Peter Bell	0dc40474fe	Migrate glu from the THC to ATen (CUDA) (#61153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61153 Fixes gh-24571, fixes gh-24572 Closes gh-39586, closes gh-39586 Benchmarks ---------- The benchmarks were run with nvprof calling the operator in a loop. It shows reliable improvements for large tensors, but the TH implementation seems to fair better for smaller tensors. For sufficiently large tensors, the ATen implementation does win though. \| Shape \| Dim \| Master Forward (us) \| This PR Forward (us) \| Master Backward (us) \| This PR Backward (us) \| \|-------------:\|-----\|:-------------------:\|:--------------------:\|:--------------------:\|:---------------------:\| \| 128, 1000 \| 0 \| 2.4770 \| 2.0820 \| 3.0440 \| 3.4680 \| \| \| 1 \| 2.7060 \| 4.4850 \| 3.3380 \| 3.6250 \| \| 128, 10000 \| 0 \| 26.531 \| 21.366 \| 38.083 \| 34.623 \| \| \| 1 \| 27.680 \| 30.465 \| 38.943 \| 35.204 \| \| 128, 100000 \| 0 \| 292.09 \| 219.56 \| 355.57 \| 324.49 \| \| \| 1 \| 260.43 \| 243.08 \| 332.25 \| 323.37 \| \| 128, 1000000 \| 0 \| 2475.7 \| 1874.6 \| 3810.1 \| 3215.7 \| \| \| 1 \| 2586.3 \| 2380.9 \| 3349.9 \| 3207.8 \| Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29538093 Pulled By: ngimel fbshipit-source-id: 1f66b45ec7c46fb8e680b50110a5fde6fe7faab7	2021-07-06 19:06:51 -07:00
James Reed	7a4ffbd1da	[FX] s/IS_SANDCASTLE/IS_FBCODE/ in tests (#61304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61304 Previously tests were unrunnable on devserver. This fixes that ghstack-source-id: 133051811 Test Plan: waitforsadcastle Reviewed By: Chillee Differential Revision: D29561806 fbshipit-source-id: 6020e5b4ba72d6de1ea2563e70fdb0e604bee1a5	2021-07-06 17:20:53 -07:00
Alexander	506397a809	Add typing return value to init in nn.Module (#45654 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45654 Reviewed By: driazati Differential Revision: D24068202 Pulled By: malfet fbshipit-source-id: 4cd9b6ca12b531311302e3cdeeab39bc45d86c94	2021-07-06 17:09:30 -07:00
Zeina Migeed	9f3167ebdf	task 1: annotate (#60621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60621 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29493619 Pulled By: migeed-z fbshipit-source-id: 1bd3fb02c90ae5b394869a474b2e6b06af0d4791	2021-07-06 16:48:11 -07:00
Elton Leander Pinto	a1ad28da10	Refactor clang_tidy.py (#61119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61119 This change spilts the clang-tidy CI job into smaller steps and uses a refactored version of the clang_tidy.py script. The new folder structure is as follows: ``` tools/linter/clang_tidy \|_ __main__py \|_ requirements.txt \|_ run.py \|_ setup.sh ``` `__main__.py` This script will run `tools/linter/clang_tidy/setup.sh` if a `build` directory doesn't exist, mimicing what used to be done as a separate step in the CI job. After that, it will invoke `clang-tidy` with default arguments being declared in the script itself (as opposed to declaring them in lint.yml). The reasoning behind this approach is two-fold: - Make it easier to run `clang-tidy` locally using this script - De-duplicate the option passing `requirements.txt` Contains a list of additional python dependencies needed by the `clang-tidy` script. `setup.sh` If a build directory doesn't exist, this command will run the necessary codegen and build commands for running `clang-tidy` Example usage: ``` python3 tools/linter/clang_tidy --parallel ``` Notice that we don't have to put the `.py` at the end of `clang_tidy`. Test Plan: Run the following command: ``` python3 tools/linter/clang_tidy --paths torch/csrc/fx --parallel ``` Reviewed By: walterddr, janeyx99 Differential Revision: D29568582 Pulled By: 1ntEgr8 fbshipit-source-id: cd6d11c5cb8ba9f1344a87c35647a1cd8dd45b04	2021-07-06 16:02:11 -07:00
Fritz Obermeyer	81e36d02a6	Improve error message on invalid values to Distribution methods (#61056 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18133 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61056 Reviewed By: jbschlosser Differential Revision: D29510173 Pulled By: neerajprad fbshipit-source-id: 205ec7de6c8576a73e77ee4bf01c30e99b38a52e	2021-07-06 15:44:55 -07:00
driazati	45cc207a88	Fix breakpad build + add test canary (#60990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990 This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds). Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29514316 Pulled By: driazati fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb	2021-07-06 14:15:07 -07:00
Richard Barnes	b6024b9d12	More loop transforms 2 Summary: Exact duplicate of D29410111 to fix land issues. Test Plan: Sandcastle Reviewed By: walterddr Differential Revision: D29538335 fbshipit-source-id: 6a4f9ac4a505339ed242af60fe7fd4ba1fda3b32	2021-07-06 13:38:10 -07:00
Xiao Wang	c74c0c5718	add thrust/host_vector.h header for cuda 11.4 build (#61004 ) Summary: needed for cuda 11.4 build Close https://github.com/pytorch/pytorch/issues/61011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61004 Reviewed By: ngimel Differential Revision: D29523896 Pulled By: malfet fbshipit-source-id: acb11bdd19c0cc240696be21e5c492f8976fea65	2021-07-06 12:44:56 -07:00
Ruslan Semenov	5da507b57b	Add bazel actions workflow (#61039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61039 - Added a new template for bazel GH Actions workflow - Simplified the workflow based on malfet's suggestion by combining build and test jobs into one as we only run a small subset of tests for bazel - Tested the run to make sure it succeeds - Build step takes 4 minutes, test step takes 7 minutes The downside of this approach is that I duplicated some of the jobs in a new template file. Alternative solution would be to use something like this https://jinja.palletsprojects.com/en/3.0.x/templates/#template-inheritance, however, that is better to be done in a separate PR as linux and windows workflows would need to be changed. Another solution is to use a bunch of if else statements in a linux workflow template to accommodate bazel build as part of it, but this seems not as clean as template inheritance with jinja. Here is a link to the latest bazel run with this change https://github.com/pytorch/pytorch/actions/runs/1004656584 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29562260 Pulled By: rsemenov fbshipit-source-id: a7d7d3a0b8092f52929fb109820bfad4574f5602	2021-07-06 12:18:43 -07:00
Masaki Kozuki	fac744e116	Foreach Binary Test Refactor (#59907 ) Summary: Related: https://github.com/pytorch/pytorch/issues/58833 ## Changes I'm a bit concerned - binary ops with one tensorlist and one scalarlist support complex dtypes. To realize this, I added a specialization of [`TensorListScalarListMetadata<c10::complex<double>, 1>` ](https://github.com/pytorch/pytorch/pull/59907/files#diff-131eb9b310905b15b3528da6a23e542a3a3aa952bc88f7423c98a23a8a28cca1R49). This might be out of the scope of this pull request. cc ptrblck ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/59907 Reviewed By: mruberry Differential Revision: D29551001 Pulled By: ngimel fbshipit-source-id: 46b25fdba85dd4d6332a77b27376fe96cd422384	2021-07-06 11:49:38 -07:00
Thomas J. Fan	5503a4ac6e	DOC Improves shape documentation for *Flatten (#60980 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60980 Reviewed By: VitalyFedyunin Differential Revision: D29526650 Pulled By: jbschlosser fbshipit-source-id: 2b4b0b84e0652c4cf3e9a48debb3b1bfe4e04b05	2021-07-06 10:47:11 -07:00
Nikita Shulga	95cada8810	Make breakpad depdendencies private (#61183 ) Summary: Otherwise, it will results in the following errors for people developing extensions ``` CMake Error in frontends/pytorch/csrc/CMakeLists.txt: Imported target "torch" includes non-existent path "/usr/local/include/breakpad" ``` Fixes different issue reported in https://github.com/pytorch/pytorch/issues/60485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61183 Reviewed By: driazati Differential Revision: D29538332 Pulled By: malfet fbshipit-source-id: e83cfd0b335e9b0b1ba5715789b09765db671346	2021-07-06 10:02:34 -07:00
Nikita Shulga	635d864b26	Fix modernize-use-equals-default nolint failures in torch/csrcs (#61142 ) Summary: Test-plan: Compile + clang-tidy Pull Request resolved: https://github.com/pytorch/pytorch/pull/61142 Reviewed By: VitalyFedyunin Differential Revision: D29529372 Pulled By: malfet fbshipit-source-id: 2ccde7712a51c28243b16bbb4d1d68086e0414a6	2021-07-06 09:46:46 -07:00
Rong Rong (AI Infra)	718db968b8	move CI related functions out of run_test.py (#61124 ) Summary: run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards) Follow up PRs should: - refactor those file read/write logic entangled inside test_selections.py into stats/ folder - restructure and add network independent test logics to test_test_selections.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124 Test Plan: - tools/test - CI Related PR: This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373 Reviewed By: malfet Differential Revision: D29558981 Pulled By: walterddr fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169	2021-07-06 09:06:42 -07:00
Ruslan Semenov	864dcbb2cc	Set sccache bucket on test runs to save some run minutes (#61140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61140 While working on bazel port to GitHub Actions I noticed that we do not set sccache bucket for test runs that causing cache misses while running test jobs. For example https://github.com/pytorch/pytorch/runs/2965919198?check_suite_focus=true test run 1 uses local cache and has 44 cache misses with avg 9 sec read per miss it is saving 44*9/60 = 7 minutes per run. Here is another example https://github.com/pytorch/pytorch/runs/2966210127?check_suite_focus=true Open to feedback if there is a downside of using AWS cache. Test Plan: Imported from OSS Reviewed By: 1ntEgr8 Differential Revision: D29557292 Pulled By: rsemenov fbshipit-source-id: e8fb000850ec4627d7cccf690e8f5743999fdf36	2021-07-06 07:29:57 -07:00
Zafar	05c1e5b655	[sparsity] Lambda Scheduler (#59771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59771 Implements a specific sparsity scheduler, that uses a user-provided lambda's to change the levels. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D29070604 D29070604 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: c7ccbe63fe4cd6a0c3563541b7fcf93a99d0e62f	2021-07-02 21:39:38 -07:00
Zafar	37ebf2e3cd	[sparsity] Base sparsity level scheduler class (#59770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59770 Implements the base scheduler class for changing the sparsity levels in the sparsifier. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D29070603 D29070603 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 0b160e4eb0a2a303d2d19e6a3beb4784002b2cb7	2021-07-02 21:38:24 -07:00
Richard Barnes	ed63fb5225	Fix some more loops (#60895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60895 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29431572 fbshipit-source-id: fbcf48696bf2c90cc0973a767d83bb526f6ccd7f	2021-07-02 19:17:08 -07:00
Rohan Varma	43fb39c3eb	[DDP] Make uneven inputs work with comm. hook (#61020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61020 Makes uneven input support with `join` context manager work with custom communication hooks. This will ensure that the two features can work well together. Added relevant unittests to test allreduce and powerSGD hooks. Instead of calling `allreduce`, the join manager now calls into `_run_reduction_hook` which will automatically run whatever hook is installed. ghstack-source-id: 132950108 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29480028 fbshipit-source-id: c91dc467a62c5f1e0ec702a2944ae3deb10f93f4	2021-07-02 18:48:21 -07:00
Rohan Varma	94b730681f	[DDP] Refactor uneven inputs to take GradBucket (#61019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61019 Changes uneven input logic of running allreduce to using `GradBucket` structure. This is to enable support for comm. hook with join in the next diff. ghstack-source-id: 132950107 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D29480027 fbshipit-source-id: 7c42c53653052f71b86a75e14a5fc7ae656433f7	2021-07-02 18:47:23 -07:00
Peter Bell	512448a425	CTCLoss: Remove dispatching in parallel region (#60599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60599 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446190 Pulled By: ngimel fbshipit-source-id: eb01783c8c32a1405b58e1364fc3d71c0f054e0a	2021-07-02 17:55:56 -07:00
Zafar	d42f1751d4	[sparsity] WeightNormSparsifier (#58955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58955 Implements the weight norm sparsifier. This type of sparsifier computes the norm of the weights, sorts them, and zeroes-out the target fraction of them. The main imeplemented method is `update_mask`, which holds the main logic of changing the masks. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970960 D28970960 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 8f2a4360ad877f430cdc1065c6777106938b58d5	2021-07-02 17:35:27 -07:00
Zafar	7ab2729481	[sparsity][refactor] Import factoring out (#58707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58707 Minor refactor that changes the format of the import. This is done to avoid accidental circular dependencies. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970961 D28970961 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: c312742f5e218c435a1a643532f5842116bfcfff	2021-07-02 16:32:39 -07:00
Zafar	973e9266ff	[sparsity] Sparsifier class (#58704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58704 Implements the base sparsifier class based on the #59835 RFC documents. This PR implements the base class for the sparsification. Specifically, the prepare method is implemented. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970958 D28970958 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 0ef98a445c0a0aca22ce5708e34a9f94606d0e2b	2021-07-02 16:31:21 -07:00
Zafar	80cab10534	[sparsity] Sparsity parametrization (#58705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58705 The basic demo for this particular implementation can be found here: https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0 Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970959 D28970959 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 2a0bea1e0a81816690e05f83051d607c90925d32	2021-07-02 11:12:31 -07:00
Zafar	5d34b7955b	[sparsity][refactor] Changing linear row/col control (#60850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60850 Test Plan: ``` python test/test_ao_sparsity.py ``` ``` python test/test_ao_sparsity.py ``` Differential Revision: D29465900 D29465900 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 412f50da857f377898fea79d378ae54a049b81fe	2021-07-02 11:12:30 -07:00
Zafar	509b1ef9d5	[sparsity] Add sparsity tests to run_test.py (#60887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887 Test Plan: ``` ./test/run_test.py -i test_ao_sparsity ``` ``` ./test/run_test.py -i test_ao_sparsity ``` Differential Revision: D29465834 D29465834 Reviewed By: mruberry Pulled By: z-a-f fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7	2021-07-02 11:11:20 -07:00
Peter Bell	54673fc944	Sparse: Remove dispatch in parallel region (#60598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60598 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446192 Pulled By: ngimel fbshipit-source-id: 1a11f3aa847e4ce83fc6f50cee362b7d0cb61eae	2021-07-01 21:56:17 -07:00
Rohan Varma	11b722c063	[DDP] Refactor hook running logic (#61018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61018 Extract logic of hook running to a function `run_reduction_hook` that takes in a `GradBucket` and runs the hook/allreduce. This is mainly to prepare for join to support comm. hook in follow up diffs. ghstack-source-id: 132924220 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29477143 fbshipit-source-id: 87e8e563e71821fd462d6b259c98a6a0afbcd7b4	2021-07-01 20:41:55 -07:00
Rohan Varma	b21df03f3b	[DDP] Remove SPMD from get_bucket_tensors (#61017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61017 Removes SPMD nested vector logic from this codepath. This is mostly in preparation for the next diffs in this stack which enable support for join with comm. hook. ghstack-source-id: 132924223 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29477360 fbshipit-source-id: f8132a94b1abfe28586aa78ac47e13a7ce6bb137	2021-07-01 20:40:53 -07:00
Meghan Lele	4a2e8b53bb	[JIT] Add `torch._C.ScriptList`` (#52832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832 Summary This commit adds `torch._C.ScriptList`, a list type that has reference semantics across the Python/TorchScript boundary. That is, modifications made in TorchScript to instances of `torch._C.ScriptList` are visible in Python even when it is not returned from the function. `torch._C.ScriptList` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`, wrapper classes around `c10::impl::GenericList` and `c10::impl::GenericList::iterator`. These bindings allow instances of `torch._C.ScriptList` to be used as if it were a regular `list` in Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code). Test Plan This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests that check that all of the common list operations are supported and that instances have reference semantics across the Python/TorchScript boundary. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29478121 Pulled By: SplitInfinity fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54	2021-07-01 20:28:13 -07:00
David Riazati	6e9e30cc1d	Ignore notebooks when checking for newlines (#61156 ) Summary: Fix lint on master (these files should be considered "generated" so don't lint them) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61156 Reviewed By: malfet Differential Revision: D29532211 Pulled By: driazati fbshipit-source-id: a1e47f45bedf441613bdc2bd60fbf8299e5c962f	2021-07-01 18:11:43 -07:00
Supriya Rao	a4d86e0d53	[quant][fx][perf] improve runtime of prepare step for large models (#61132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132 For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was prepare_fx 979 seconds convert_fx 9 seconds The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used After this PR prepare_fx 26 seconds convert_fx 9 seconds Test Plan: Existing tests Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29522303 fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca	2021-07-01 17:17:10 -07:00
Vitaly Fedyunin	277b310edb	[DataLoader] Add notebook with DataPipes API example (#60680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60680 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461079 Pulled By: VitalyFedyunin fbshipit-source-id: 6532bf77113ab89a50f8bb022daf80f8477e9297	2021-07-01 16:39:28 -07:00
Karen Zhou	ca2702a776	[pruner] Make bias hook stateless (#61077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61077 Removing `BiasHook` class, using function instead. ghstack-source-id: 132899223 Test Plan: ` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1L7Tg Reviewed By: z-a-f Differential Revision: D29504119 fbshipit-source-id: 6dd9689d18b17ac64e8a461f466e2c9018bc530b	2021-07-01 14:59:00 -07:00
Karen Zhou	0a7875231b	[pruner] Add bias support (#60970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60970 Support adding bias in eager mode ghstack-source-id: 132695883 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1L3K3 Reviewed By: z-a-f Differential Revision: D29441499 fbshipit-source-id: 47e0fff5b3014612bd021e145160ea54e2645e24	2021-07-01 14:57:09 -07:00
Thomas J. Fan	87dbdef65d	MAINT Adds test and docs for Linear with no batch dims (#60992 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR updates docs for `Linear` and adds a non-batch test case to `common_nn.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60992 Reviewed By: VitalyFedyunin Differential Revision: D29518451 Pulled By: jbschlosser fbshipit-source-id: 6dd79c0f21ac5b6f693e3e1ba954379d2606d4e0	2021-07-01 14:49:24 -07:00
Akshit Khurana	369802a504	Add aten::avgpool2d NNAPI converter (#58538 ) Summary: Add support for aten::avgpool2d op in the NNAPI model converter with var size support Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d Reviewed By: anshuljain1 Differential Revision: D28531944 fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824	2021-07-01 14:07:14 -07:00
Martin Yuan	19b6ee4d4e	model_dump working with delegate models (#61043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61043 Trying to make model_dump work with delegate models ghstack-source-id: 132809755 Test Plan: N509022. The data.pkl in the lowered model: ``` bash-3.2$ python -m torch.utils.show_pickle /Users/myuan/models/backend/lowered_model.pt@*/data.pkl torch.jit.backend_with_compiler_demo.LoweredModule.__torch__.___torch_mangle_5.ModuleAdd()(state= (torch.jit._pickle.restore_type_tag({'forward': torch.jit._pickle.restore_type_tag({'input_shapes': '((1, 1, 320, 240), (1, 3))', 'some_other_option': 'True'}, 'Dict[str, str]')}, 'Dict[str, Any]'), torch.jit._pickle.restore_type_tag({'forward': 'prim::Constant#1<debug_handle>271,aten::add<debug_handle>272'}, 'Dict[str, str]'), True)) ``` Comparing to data.pkl in scripted_model.pt: ``` __torch__.___torch_mangle_7.ModuleAdd()(state= {'_is_full_backward_hook': None, 'training': True}) ``` Reviewed By: Amyh11325 Differential Revision: D29464860 fbshipit-source-id: d738e98ea518339465f8e3375207cf83e3dac532	2021-07-01 13:39:56 -07:00
Pearu Peterson	374278f431	Improved sparse CSR tensor sampling method (#60283 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59379 The improved sparse CSR tensor sampling method is described in https://pearu.github.io/csr_sampling.html that features: - for specified `nnz`, one gets a CSR sample with the same `nnz` - variability of the number of specified columns per row is maximized - `crow_indices` content is randomized - a given row specific `col_indices` content is sorted and filled with unique values (see also https://github.com/pytorch/pytorch/issues/60277) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60283 Reviewed By: bhosmer Differential Revision: D29492605 Pulled By: cpuhrsch fbshipit-source-id: 8d875b7c2b0573a9ab37047c6d8fe8b540295ce1	2021-07-01 13:26:19 -07:00
Mike Guo	6ecc1a4c4f	Make pytorch clang-tidy clean (#60649 ) Summary: This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master. I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver): ```bash python3 setup.py develop # Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options python3 tools/clang_tidy.py \ -j \ -s \ -k \ -v \ --paths torch/csrc/ \ -g"-torch/csrc/jit/passes/onnx/helper.cpp" \ -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \ -g"-torch/csrc/jit/serialization/onnx.cpp" \ -g"-torch/csrc/jit/serialization/export.cpp" \ -g"-torch/csrc/jit/serialization/import.cpp" \ -g"-torch/csrc/jit/serialization/import_legacy.cpp" \ -g"-torch/csrc/onnx/init.cpp" \ -g"-torch/csrc/cuda/nccl." \ -g"-torch/csrc/cuda/python_nccl.cpp" \ -g"-torch/csrc/autograd/FunctionsManual.cpp" \ -g"-torch/csrc/generic/.cpp" \ -g"-torch/csrc/jit/codegen/cuda/runtime/*" \ -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \ -g"-torch/csrc/deploy/interpreter/interpreter.h" \ -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \ -g"-torch/csrc/deploy/interpreter/test_main.cpp" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649 Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors. Reviewed By: walterddr, janeyx99 Differential Revision: D29504258 Pulled By: 1ntEgr8 fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e	2021-07-01 12:21:07 -07:00
driazati	a0a9ea6598	Fix documentation preview instructions (#61080 ) Summary: People don't need to self host these anymore since we do it automatically in PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/61080 Reviewed By: VitalyFedyunin, janeyx99 Differential Revision: D29506465 Pulled By: driazati fbshipit-source-id: 45875cb229f8cc565a9a1405f52cef198ee0e687	2021-07-01 12:17:34 -07:00
Rohan Varma	60509f8921	Update DDP documentation to mention outputs not used in loss is supported (#60275 ) Summary: We recently landed a change to ensure that when running under ``find_unused_parameters=True``, not all module outputs have to be used in loss computation and DDP will work as expected. Mention this update in the documentation and add some additional clarification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60275 Reviewed By: SciPioneer Differential Revision: D29502609 Pulled By: rohan-varma fbshipit-source-id: ddb3129cff9492018e61813413b30711af212309	2021-07-01 11:56:53 -07:00
Luca Wehrstedt	0128eb9a85	Fix TSAN issue in distributed tests (#59238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59238 Creating a `mutliprocessing.Manager()` launches a new process using the `fork` method (because it's the default one), and then in that subprocess it launches a new thread. TSAN really doesn't like this (and rightly so!) because we already had threads in the superprocess, and intermixing threads and forks is dangerous. The proper way to deal with this is to `exec` inside the child process or, in other words, use the `spawn` method. Note that the method used to launch the Manager is entirely unrelated from the method used to launch our "own" subprocesses, hence we were using `fork` for the Manager even though we were using `spawn` for our own subprocesses. ghstack-source-id: 130240724 Test Plan: Reverted the silencing introduced in D28490129, ran the `test_init_rpc_then_pg` test from the TensorPipe suite and saw the original TSAN failure. Then applied my fix, re-ran the test, and the failure was gone. Reviewed By: zhaojuanmao Differential Revision: D28794321 fbshipit-source-id: 12242e69be399a7f02a40a0ebb3d92f92e00ce73	2021-07-01 11:53:01 -07:00
Victor Quach	5b44d817fb	Expose raw saved tensors for codegen functions (#60565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60565 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29466225 fbshipit-source-id: 77eb4214a1baecc501282413d99d55f8935dc01f	2021-07-01 11:25:21 -07:00
Sam Estep	3f0f860a1c	Condense JIT/Quantization triage into one workflow (#61130 ) Summary: The `.github/workflows/{jit,quantization}_triage.yml` workflows are nearly identical, so this PR consolidates them into a single GitHub Actions workflow to reduce code duplication. It also renames the workflow so it starts with a capital letter, so that it will show up alongside all our other GitHub Actions workflows on [the HUD](https://hud.pytorch.org/build2/pytorch-master). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61130 Reviewed By: walterddr Differential Revision: D29520022 Pulled By: samestep fbshipit-source-id: 673789762e08c2c77d72e7c20eb16d6beec573ba	2021-07-01 10:50:26 -07:00
Pritam Damania	6f92f10c94	Use a leaky singleton for CublasHandlePool. (#60987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60987 We were seeing deadlocks as follows during shutdown: ``` Thread 1 (LWP 2432101): #0 0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6 #1 0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0 #2 0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1 #3 0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so #4 0x00007efc651aee03 in ?? () from /lib64/libcuda.so #5 0x00007efc64f76b84 in ?? () from /lib64/libcuda.so #6 0x00007efc64f77f5d in ?? () from /lib64/libcuda.so #7 0x00007efc64eac858 in ?? () from /lib64/libcuda.so #8 0x00007efc64eacfbc in ?? () from /lib64/libcuda.so #9 0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11 #21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6 #24 0x00007efca4648c40 in exit () from /lib64/libc.so.6 #25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292 #26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636 #27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646 #28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457 #29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420 #30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907 #31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460 #32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495 #33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6 #34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103 ``` This was likely caused due to a static singleton that wasn't leaky. Following the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to use a leaky singleton instead. ghstack-source-id: 132847448 Test Plan: Verified locally. Reviewed By: malfet Differential Revision: D29468866 fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c	2021-07-01 10:23:07 -07:00
Hector Yuen	d2fef350f2	add embedding bag skeleton take 2 (#61126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61126 adding skeleton implementations of quantized embedding tables with zeroes Test Plan: compilation, farm test, and ran test_find_dangling_impls and passed did a manual negative test and verified the message is printed properly ``` ====================================================================== FAIL: test_find_dangling_impls (test_dispatch.TestPythonDispatcher) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/test_dispatch.py", line 892, in test_find_dangling_impls self.assertEqual( File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/torch/testing/_internal/common_utils.py", line 1498, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Scalars failed to compare as equal! 0 != 1 Expect zero dangling impls, but found: ['name: quantized::qembedding_bag_4bit_unpack\nschema: (none)\nCUDA: registered at caffe2/aten/src/ATen/native/quantized/cuda/embedding_bag.cu:394 :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ]\n'] Reviewed By: walterddr Differential Revision: D29518274 fbshipit-source-id: d0cb81c8bf51cdc4b83038758131ccf61e4360f5	2021-07-01 10:11:45 -07:00
Michael Suo	e5ae0e652d	[jit] Allow instance overrides of ignored methods (#61076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61076 Previously we would always retrieve ignored methods from the type, which doesn't work when the user has overriden the ignored method for a specific instance. This PR changes things up so we retrieve the ignored method as a bound method from the object being scripted, unwrap it, then re-bind it to the scriptmodule. Test Plan: Imported from OSS Differential Revision: D29504421 Pulled By: suo fbshipit-source-id: 14649863ea69a8d2180dd2c4341ec9a826039de1	2021-07-01 09:26:30 -07:00
Vitaly Fedyunin	ccfdb30644	Revert D29413019: [torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` Test Plan: revert-hammer Differential Revision: D29413019 (`4e181dfc35`) Original commit changeset: 323bfbad9d0e fbshipit-source-id: 1f8ae4b3d0a23f3eaff28c37e9148efff25fafe2	2021-07-01 08:44:51 -07:00
Vitaly Fedyunin	48bfc0e51c	[DataLoader] Add Example Only `fork` DataPipe (#60679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60679 This is example only DataPipe, not intended to be used in production. Used for tutorials, tests and documentation. Have to be replaced by real `fork` upon DataLoader update. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461084 Pulled By: VitalyFedyunin fbshipit-source-id: a7e435f055f040e358f5465092b8daa07f8e29b7	2021-07-01 08:41:26 -07:00
Vitaly Fedyunin	62b2dc2059	[DataLoader] Decorate ZipDataPipe as `zip` (#60678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60678 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461085 Pulled By: VitalyFedyunin fbshipit-source-id: f2037fbc67369aae10b07ef80a19e2a0ea7bf530	2021-07-01 08:41:25 -07:00
Vitaly Fedyunin	8e21ff91e2	[DataLoader] Add simple `groupby` DataPipe (#60675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60675 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461082 Pulled By: VitalyFedyunin fbshipit-source-id: ded5a3a1555bfd8457d64b7e61ab6729fff9cb75	2021-07-01 08:40:20 -07:00
Vitaly Fedyunin	cb7d813275	Revert D28836794: SumKernel (BFloat16): use float as accumulation type Test Plan: revert-hammer Differential Revision: D28836794 (`4f5c68857f`) Original commit changeset: 46ed3a862c2b fbshipit-source-id: 3b586eeb752b7cdee909fa97a4c78876a6014770	2021-07-01 08:12:31 -07:00
Richard Barnes	11dca2e5f3	Fix some integer comparisons (#60894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60894 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29431512 fbshipit-source-id: b0ef7656806f378ad823e503e7c27cc563d3dc7d	2021-07-01 08:08:39 -07:00
Rong Rong (AI Infra)	7017dc101f	Revert D29313058: add an embedding bag skeleton operators Test Plan: revert-hammer Differential Revision: D29313058 (`ae21357ada`) Original commit changeset: b05df6ff9a7c fbshipit-source-id: ef422aedad71dee6cb2824c58aceb66104376a65	2021-07-01 07:37:02 -07:00
Shijun Kong	d6521c2249	[pyper][emb][quantization] Support emb trained in FP16 (#60736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60736 Add support of embedding with input data type as float16, utilize new kernel functions added in fbgemm https://github.com/pytorch/FBGEMM/pull/616 Test Plan: `buck test caffe2/test/:quantization -- test_embedding_bag` Reviewed By: supriyar Differential Revision: D29392320 fbshipit-source-id: 0a120b3a58b6cf1d84961831097e9581ffd2b591	2021-07-01 07:35:59 -07:00
Elton Leander Pinto	d42aa176e4	Bump docker image tag for clang-tidy (#61115 ) Summary: The new tag should fix the "Missing <omp.h>" error message on clang-tidy runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61115 Test Plan: Ran the clang-tidy job using the diff from https://github.com/pytorch/pytorch/issues/60976. Expected Output: There should be no clang diagnostic errors. Reviewed By: walterddr Differential Revision: D29516845 Pulled By: 1ntEgr8 fbshipit-source-id: 554229904db67eb7a7b93b3def434b30de6a43b0	2021-07-01 07:30:28 -07:00
Hao Lu	46595a9623	[Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090 Reviewed By: ajyu Differential Revision: D29479860 fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b	2021-07-01 00:01:37 -07:00
Zafar	c1499a9933	Enable jit tracing to parametrization and add jit tests (#60969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60969 This PR fixes the tracing in the parametrizations. The current resolution is that when tracing is performed while caching is enabled, we throw an error. Without caching, the tracing should work properly (tests added). Currently, the parametrizations don't support scripting. This PR introduces the same logic as with the tracing (throw error if caching). However, the scripting itself cannot enabled due to the use of the generator expressions in the parametrizations. Added TODO to fix it. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29462887 Pulled By: z-a-f fbshipit-source-id: 49721d3059be58f36055d1c374080df41a748d66	2021-06-30 23:54:02 -07:00
Aliaksandr Ivanou	4e181dfc35	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#60925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: kiukchung, cbalioglu Differential Revision: D29413019 fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630	2021-06-30 23:31:02 -07:00
Hector Yuen	ae21357ada	add an embedding bag skeleton operators (#60491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60491 basic reference embedding bag operators, these are not going to be performant but can be used for functionality enablement these operators will output the right shape, but the implementation is empty Test Plan: tbd Reviewed By: vkuzo Differential Revision: D29313058 fbshipit-source-id: b05df6ff9a7c0c6ac46ef64a42464988453bd460	2021-06-30 23:09:11 -07:00
Philip Meier	db1dd9e7e0	add support for quantized tensors in `torch.testing.assert_close` (#58926 ) Summary: This adds support for quantized tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does: `bf269fdc98/torch/testing/_internal/common_utils.py (L1314-L1341)` - `.qscheme()` is checked for equality - `.q_scale` and `q_zero_point` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine` - `.q_per_channel_scales`, `q_per_channel_zero_points`, and `q_per_channel_axis` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine` - values are checked with the default checks after a `.int_repr().to(torch.int32)` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/58926 Reviewed By: jerryzh168 Differential Revision: D29483532 Pulled By: mruberry fbshipit-source-id: 003fde7e21cf844778a879c3de0a7c84d13877bd	2021-06-30 21:43:02 -07:00
Jeffrey Wan	06fc637b41	Check native_function's outputs' TensorImpl and StorageImpl (#60286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25927 Does some checks described in https://github.com/pytorch/pytorch/issues/25927#issuecomment-589354373: If function does not modify its inputs (non-inplace and has no out arg): - Check TensorImpl has use_count of 1. (This should make us aware of functions that return self. - If function is a view function check that StorageImpl is same as that of the aliased input, otherwise, StorageImpl's use_count is 1. Detected a couple functions that failed the check that returned TensorImpl should have use_count of 1: 'native_batch_norm', 'native_batch_norm_backward', '_embedding_bag'. (Filing issues). Examples of generated code: We did not update checks for in-place ops (this includes in-place views). Example of a view: - Check that outputs StorageImpl of `result` is the same as that of `self`. - Check TensorImpl has use_count of 1 ```cpp at::Tensor as_strided(c10::DispatchKeySet ks, const at::Tensor & self, at::IntArrayRef size, at::IntArrayRef stride, c10::optional<int64_t> storage_offset) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); (void)_any_requires_grad; std::shared_ptr<AsStridedBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AsStridedBackward>(new AsStridedBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_geometry = TensorGeometry(self); grad_fn->size = size.vec(); grad_fn->stride = stride.vec(); grad_fn->storage_offset = storage_offset; } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowAutograd guard; return at::redispatch::as_strided(ks & c10::after_autograd_keyset, self_, size, stride, storage_offset); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(result.storage())); <<<<<<<<<<<<<<<<<<<<<<<< AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with as_strided that does not support it."); return result; } ``` Example of non-view: - Check that output's StorageImpl has use_count of 1. - Check that output's TensorImpl has use_count of 1. ```cpp at::Tensor asin(c10::DispatchKeySet ks, const at::Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); (void)_any_requires_grad; std::shared_ptr<AsinBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AsinBackward>(new AsinBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::asin(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage()) AT_ASSERT(result.storage().use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<< AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } if (isFwGradDefined(self)) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto self_p = toNonOptPrimal(self); auto result_new_fw_grad = (self_t.conj() * (-self_p * self_p + 1).rsqrt().conj()).conj(); if (result_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op */ false); } } return result; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60286 Reviewed By: jbschlosser Differential Revision: D29402253 Pulled By: soulitzer fbshipit-source-id: b90f34c455b8767f95a52c329db351dbbb495397	2021-06-30 19:19:01 -07:00
Joel Schlosser	03b5a225a7	Test parametrization for instantiated device-specific tests (#60233 ) Summary: The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`. This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic. One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism. The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability. Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233 Reviewed By: iramazanli Differential Revision: D29494995 Pulled By: jbschlosser fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc	2021-06-30 18:50:22 -07:00
Zhengxu Chen	6643df2680	[jit] Use computed loop to dispatch to next instruction in interpreter. (#60211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60211 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29211283 fbshipit-source-id: 2f87b5a78d4fc00ce11ed509fc15db35332690b6	2021-06-30 17:44:26 -07:00
Xiaomeng Yang	357a21bc92	Fix numerical issue of rowwise normalization in Caffe2 and internal tests. (#60880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60880 Fix numerical issue of rowwise normalization in Caffe2 and internal tests. Test Plan: buck test mode/opt //dper3/dper3/modules/tests:xdeepint_test -- --exact 'dper3/dper3/modules/tests:xdeepint_test - test_xdeepint_with_full_features_with_interactions_3 (dper3.dper3.modules.tests.xdeepint_test.XdeepInt_Test)' Reviewed By: esqu1 Differential Revision: D29431597 fbshipit-source-id: 72df52fdcbb29ad3de7b9472f25fde26cf804a76	2021-06-30 17:31:04 -07:00
Rong Rong (AI Infra)	0824b919ec	[BE] move general script out of .circleci/ into tools/ (#60973 ) Summary: Second step in https://github.com/pytorch/pytorch/issues/60373. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60973 Reviewed By: samestep Differential Revision: D29499385 Pulled By: walterddr fbshipit-source-id: 22df22f78f6b9af6221917a10188218773245009	2021-06-30 17:20:05 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Rong Rong (AI Infra)	2d0c6e60a7	going back to use packaging.version.parse instead (#61053 ) Summary: I think this may be related to https://app.circleci.com/pipelines/github/pytorch/vision/9352/workflows/9c8afb1c-6157-4c82-a5c8-105c5adac57d/jobs/687003 Apparently `pkg_resource.parse_version` returns a type of `pkg_resources.extern.packaging.version.Version` instead of `packaging.version.Version` and seems on some older version of the setuptools it doesn't support `.major/minor` operation. changing it back to using `packaging.version.parse` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61053 Test Plan: CI Reviewed By: samestep Differential Revision: D29494322 Pulled By: walterddr fbshipit-source-id: 294572a10b167677440d7404e5ebe007ab59d299	2021-06-30 16:23:59 -07:00
David Riazati	a2ad84afbb	Send test reports to S3 (#61071 ) Summary: This sends the test reports zip to S3 in addition to the GitHub artifact store. This makes it easier to query in the PR HUD since we don't have to deal with the GitHub API's rate limits / download speeds. The impact on S3 storage should be minimal since it's only 500 KB or so per run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61071 Reviewed By: nikithamalgifb Differential Revision: D29498941 Pulled By: driazati fbshipit-source-id: 74bfbe7fa7d1d97fd8a6938c98dfe0caff0ab6eb	2021-06-30 16:00:01 -07:00
Han-Hsien Huang	812ed47caa	[Static runtime] Add unit tests to ops bmm and addmm (#61000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61000 Add unit tests to bmm and addmm operators in static runtime. Test Plan: buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest {F628935117} Reviewed By: hlu1 Differential Revision: D29459679 fbshipit-source-id: 5c7fa5c9b0675c1c84f3ae3110204d663255009c	2021-06-30 15:55:58 -07:00
Mike Guo	4ff81ab112	escape backward slash in stack trace in Windows to slash (#60842 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60842 Reviewed By: gdankel Differential Revision: D29498498 Pulled By: malfet fbshipit-source-id: 78e1b25a2e6bdfd3ba0c988d023c7a7f79a22cf4	2021-06-30 15:32:03 -07:00
Meghan Lele	6c1c1111de	[JIT] Add reference semantics to TorchScript classes (#44324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324 Summary This commit adds reference semantics to TorchScript class types; modifications made to them within TorchScript will be visible in Python. Test Plan This commit adds a unit test to `TestClassType` that checks that modifications made to a class type instance passed into TorchScript are visible in Python after executing the scripted function or module. Fixes This commit closes #41421. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24912807 Pulled By: SplitInfinity fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0	2021-06-30 14:27:17 -07:00
driazati	aa728dc335	Fix fx patch module name (#61062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61062 Instead of being 'patch' this should be the import-able name of the module (it's defined as `_fx` on the `torch._C` module, so the full name should be `torch._C._fx`). This now works correctly: ```python >>> import torch._C._fx >>> dir(torch._C._fx) ['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'patch_function'] ``` Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29497018 Pulled By: driazati fbshipit-source-id: 093aa0552b48feb0aabe47bdf72776dddd5a3b8f	2021-06-30 14:23:35 -07:00
Angela Yi	dabadd7e20	[quant] Added reset_min_max_vals() function to observers (#60883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883 As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code. Test Plan: `python test/test_quantization.py TestEqualizeFx` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29491848 fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f	2021-06-30 14:22:08 -07:00
Angela Yi	1a0195db49	[quant] Input-Weight Equalization - support for LinearReLU layers (#60653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653 Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers. Initial Model: `x -> linear1 -> relu` After fusion: `x -> linearRelu` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize` More step-throughs here: https://fb.quip.com/A9J3AsBxkykR Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` LinearReluModel( (fc): Linear(in_features=5, out_features=5, bias=True) (relu): ReLU() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29406999 fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4	2021-06-30 14:22:06 -07:00
Nikita Shulga	546102e161	Fix overflow in quantize_val_arm (#60079 ) Summary: By using `__builtin_add_overflow` to detect integer overflows when `zero_point` is added to rounded integral value. Also fix small typo. After this PR `python3 -c "import torch;print(torch.torch.quantize_per_tensor(torch.ones(10) * 2**32, 0.5, 1, torch.quint8))"` returns same vector of `127` on both x86_64 and aarch64 platforms This change merely mitigates overflow bug, more proper (and perhaps performance impacting) fix would be to add `zero_point` to floating values both in serial and in vectorized code. Filed https://github.com/pytorch/pytorch/issues/61047 to track this one Also filed https://github.com/pytorch/pytorch/issues/61046 to clarify intended use of `__ARM_NEON__` define Fixes https://github.com/pytorch/pytorch/issues/60077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60079 Reviewed By: kimishpatel Differential Revision: D29157883 Pulled By: malfet fbshipit-source-id: 6f75d93e6d3d4d0d5a5eab545cb27773086b9768	2021-06-30 14:20:56 -07:00
Nikita Shulga	cef0851223	Make torch.utils.bencmark numpy free (#60564 ) Summary: PyTorch core do not depend on numpy, so benchmarks should not depend on it as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/60564 Reviewed By: robieta Differential Revision: D29497375 Pulled By: malfet fbshipit-source-id: d9566e5b2e48868cef5568cd62f691af19ccf1f1	2021-06-30 14:17:32 -07:00
Jeff Daily	d1a4c9e682	[ROCm] allow user to override PYTORCH_ROCM_ARCH (#60602 ) Summary: Restores the ability of a user to call .jenkins/pytorch/build.sh while also setting PYTORCH_ROCM_ARCH. Otherwise, with IN_CI=1 as the new default, it will forcibly ignore user settings when build.sh is used outside of CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60602 Reviewed By: samestep Differential Revision: D29490791 Pulled By: janeyx99 fbshipit-source-id: b5e8a529b8e0b5020b260b4bf027a37e0c1df8d5	2021-06-30 13:35:11 -07:00
Richard Barnes	14cc234a8a	Fix some comparison warnings (#60875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60875 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406593 fbshipit-source-id: 0eb070ef05c1cd343c9e835786b42014d0553aa5	2021-06-30 13:09:41 -07:00
Richard Barnes	74692f3ada	Loop transformation (#60874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60874 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406474 fbshipit-source-id: c994361e9fdafb7c4519ce2f1c40288a9ef025be	2021-06-30 13:09:39 -07:00
Richard Barnes	a8b56ea58b	Remove another for-loop in SoftMax (#60873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60873 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406429 fbshipit-source-id: 3b5710ed9e5d1d14379f64670638ab119d0d78e3	2021-06-30 13:09:38 -07:00
Richard Barnes	850ff82edc	Remove for-loop for getting number of elements in favour of abstraction (#60872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60872 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406199 fbshipit-source-id: ae49672cf1bb370d574d0c21231477bb17dea0ca	2021-06-30 13:08:25 -07:00
Martin Yuan	95e77e0af2	[Delegate] A more specific prefix for lowered module name. (#61007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61007 Test Plan: Imported from OSS Reviewed By: kimishpatel, raziel Differential Revision: D29477733 Pulled By: iseeyuan fbshipit-source-id: 94a7a784d98a41ff7ba255955acf74bd26297c9f	2021-06-30 12:37:09 -07:00
Heitor Schueroff	f32f85e6da	Implemented torch.corrcoef (#60420 ) Summary: Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311. closes https://github.com/pytorch/pytorch/issues/1254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420 Reviewed By: mruberry Differential Revision: D29474687 Pulled By: heitorschueroff fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611	2021-06-30 12:36:02 -07:00
Jiewen Tan	d5be67a338	Expose findDanglingImpls to Python (#60827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60827 This diff exposed Dispatcher.findDanglingImpls to Python as _C._dispatch_find_dangling_impls. ghstack-source-id: 132799970 Test Plan: buck test mode/dev //caffe2/test:others -- test_find_dangling_impls Reviewed By: ezyang Differential Revision: D29416330 fbshipit-source-id: d2f26054b6e247be1bb9e818eaa7cb9e68a4a913	2021-06-30 12:31:19 -07:00
Peter Bell	3cf267bfa6	Embedding: Remove dispatch in parallel region (#60597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60597 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446191 Pulled By: ngimel fbshipit-source-id: d6ff010104ae621d5e3d9c269ed2b48407e71d67	2021-06-30 12:30:15 -07:00
mingfeima	4f5c68857f	SumKernel (BFloat16): use float as accumulation type (#55217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55217 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28836794 Pulled By: VitalyFedyunin fbshipit-source-id: 46ed3a862c2bb4c6325c78ecfc5d01761f7a113a	2021-06-30 12:27:42 -07:00
Amy He	4d5edef8d4	Python composite module execution unit tests on delegation of backend_with_compiler_demo (#60801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60801 backend_with_compiler_demo Added unit tests for the execution of a simple composite module with a compiler Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v gives a success Imported from OSS Reviewed By: iseeyuan Differential Revision: D29409958 fbshipit-source-id: b02e58bdcc25a2997b70ecae41a019b8596323c1	2021-06-30 12:23:32 -07:00
Rohan Varma	3957ed41a9	[DDP] Disable reducer hooks from running outside of DDP backwards. (#60921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60921 Sometimes local modules can fire hooks (such as when user calls backward after using `ddp_module.module` explicitly). This isn't supported behavior and can cause issues with various state and gradient reduction we run in DDP, so it's best to disable this entirely. ghstack-source-id: 132739311 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29435737 fbshipit-source-id: fef76a0dd2955c432131632fb81dde4a4982ad91	2021-06-30 12:19:18 -07:00
Yi Zhang	5a4282d06b	fix typo in binary_build_script (#61016 ) Summary: resolve comments in https://github.com/pytorch/pytorch/issues/60849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61016 Reviewed By: samestep Differential Revision: D29487908 Pulled By: janeyx99 fbshipit-source-id: 32feb6c6e1009324201e3d2c6fcd9a7388791401	2021-06-30 11:52:38 -07:00
Sam Estep	d44515c418	Fix lint (#61058 ) Summary: https://github.com/pytorch/pytorch/issues/61003 broke Lint / shellcheck because of a race condition with https://github.com/pytorch/pytorch/issues/60221. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61058 Test Plan: CI. Reviewed By: walterddr Differential Revision: D29494727 Pulled By: samestep fbshipit-source-id: e6c5ea6daa47db13eb6a42cc2b5bf9c938c1839d	2021-06-30 11:45:23 -07:00
Will Constable	a25e6370e5	Add IMethod interface Summary: Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy. IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy. Test Plan: add unit tests. Reviewed By: suo Differential Revision: D29463455 fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c	2021-06-30 11:28:24 -07:00
imaginary-person	dace860008	Migrate pytorch-linux-bionic-py3.8-gcc9-coverage to GHA (#61050 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59166 `pytorch-linux-bionic-py3.8-gcc9-coverage` build & tests can be run on `linux.2xlarge` instances on GHA, which have AVX512 support. Thanks cc malfet seemethere samestep zhouzhuojie Pull Request resolved: https://github.com/pytorch/pytorch/pull/61050 Reviewed By: walterddr, 1ntEgr8 Differential Revision: D29493335 Pulled By: samestep fbshipit-source-id: de79e61f13c537ef7ff30a1e04d1bbc625a06dd1	2021-06-30 11:02:57 -07:00
Dimitrije Jankov	b4496df7d3	mkl_scsrmm needs to be disabled when MKL is not used (#60051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60051 Introduction: We want to minimize the number of dependencies for the SGX port. Therefore we need the ability to disable MKL when it is not used. Problem : There is a call to mkl_scsrmm that is enabled when CAFFE2_USE_MKL is not defined. This causes a compile error. Solution : Surround the call with preprocessor checks to CAFFE2_USE_MKL Test Plan: Run the pytorch tests. Reviewed By: LiJihang Differential Revision: D29022635 fbshipit-source-id: 94ae9fdfe53399b64d8c2d4089eebe93d1d260e8	2021-06-30 10:40:18 -07:00
Jane Xu	5644c31ec0	Move windows periodic jobs to GHA (#61003 ) Summary: Moves periodic 11.3 windows jobs to GHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61003 Test Plan: https://github.com/pytorch/pytorch/pull/61003/checks?check_run_id=2947910829 Does NOT yet move the debuggable CI part yet Reviewed By: malfet Differential Revision: D29488761 Pulled By: janeyx99 fbshipit-source-id: b16b23b40fe1f6ae189292c6f2c561e5e70f122b	2021-06-30 10:25:10 -07:00
Vitaly Fedyunin	9b5e1e0734	[DataLoader] Make `batch` DataPipe sensitive to unbatch_level argument (#60672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60672 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461086 Pulled By: VitalyFedyunin fbshipit-source-id: efc6b3b567323defe64d3f1b30a5708107e62dd4	2021-06-30 10:04:32 -07:00
Vitaly Fedyunin	66de50cc11	[DataLoader] Make `shuffle` DataPipe sensitive to unbatch_level argument (#60671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60671 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461083 Pulled By: VitalyFedyunin fbshipit-source-id: 3d371017d5ce948a1e5b8182ae91033190f64da7	2021-06-30 10:03:29 -07:00
Vitaly Fedyunin	a652398465	[DataLoader] Rename transform DataPipe to legacy_transform (#60670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60670 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461081 Pulled By: VitalyFedyunin fbshipit-source-id: 57f53a91db9032a6126e86243ddea9149c473060	2021-06-30 09:49:14 -07:00
zhouzhuojie	abb4ed7412	Move clang-format to lint.yml (#60918 ) Summary: Refactor and consolidate the location of lint related workflows Pull Request resolved: https://github.com/pytorch/pytorch/pull/60918 Reviewed By: mruberry Differential Revision: D29459605 Pulled By: zhouzhuojie fbshipit-source-id: c2993cfd037a03b733a414897bd53cf407c7c268	2021-06-30 09:45:35 -07:00
Sam Estep	0b8a7daa2a	Enable multigpu_test in GHA (#60221 ) Summary: - [x] add to test matrix - [x] enable on PRs for testing - [x] modify the scripts so it actually runs the multigpu tests - [x] put `num_shards` after `shard` number - [x] use a separate test-reports artifact - [x] run on `linux.16xlarge.nvidia.gpu` - [x] validate that it works - [x] disable on PRs before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/60221 Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/984347177 Reviewed By: malfet Differential Revision: D29430567 Pulled By: samestep fbshipit-source-id: 09f8e208e524579b603611479ca00515c8a1b5aa	2021-06-30 08:52:38 -07:00
Vasiliy Kuznetsov	5576c7bdd1	ns for fx: initial support for int8 shadows fp32 (#60419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419 Adds support for NS for FX shadowed activations pass to handle int8 modules shadowing fp32 modules. The difficulty here is that in order to insert the dtype cast, we need the qparams of the input. For the current PR, we only handle the easy cases where the previous node is either a `quantize_per_tensor` or an OSS quantized module. A future PR can handle more complicated cases such as various functions. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29280050 fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6	2021-06-30 08:08:46 -07:00
Victor Quach	a5e2ea4345	Add noop register hook (#60685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60685 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29466224 fbshipit-source-id: 68c8aa022ccffeefd45062f1443d15c9a6824f3d	2021-06-30 07:46:34 -07:00
Alban Desmaison	1fd65967e5	Revert D29312809: add quantized_resize and dequantize for some cuda backends Test Plan: revert-hammer Differential Revision: D29312809 (`c4cc26f26a`) Original commit changeset: c5c5eabb98bc fbshipit-source-id: 565e215513b68eae0dacdd1660b1a01759215511	2021-06-30 07:37:09 -07:00
Hao Lu	bfe03120ee	[PyPer] Fix schema of fb::equally_split (#60852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60852 Reviewed By: ajyu Differential Revision: D29423425 fbshipit-source-id: 4525db1f268ca65d6851a5ec846a6ae2f710ec6b	2021-06-30 03:18:15 -07:00
lezcano	af5a0df1d0	Prefer linalg::qr over qr in the C++ API (#60529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60060 Also adds `torch::linalg::qr` to the C++ API, as it was missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60529 Reviewed By: ngimel Differential Revision: D29353133 Pulled By: mruberry fbshipit-source-id: e18feaffca91c13940ad3d6bd1f40bb57dc101ae	2021-06-30 02:48:04 -07:00
Kurt Mohler	b39770c461	Fix degenerate shape behavior for ord=+/-2 (#60273 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60273 Reviewed By: jbschlosser Differential Revision: D29422907 Pulled By: mruberry fbshipit-source-id: 609cd640b0477f90bebca20865e34cbe182d3909	2021-06-30 02:17:26 -07:00
Mengwei Liu	10fc58620e	[PyTorch][NASProfiler] Add moduleHierarchy Python API to print out hierarchical information about a Node (#60384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60384 Currently inlining module graph will drop module hierarchy info on Python side. Here we retrieve the module hierarchy from cpp side and expose it to a new Python API on Node called `moduleHierarchy()`. Test Plan: Usage: ``` torch._C._jit_pass_inline(module.graph) torch._C._jit_pass_propagate_shapes_on_graph(module.graph) node = module.graph.findNode("quantized::conv2d_relu") 'top(' + module.original_name + ').' + node.moduleHierarchy() + '.' + node.kind() ``` Output: ``` 'top(QuantWrapper).module(FBNetHR).0(Sequential).xif0_0(ConvBNRelu).conv(ConvReLU2d).quantized::conv2d_relu' ``` Reviewed By: kimishpatel Differential Revision: D29252169 fbshipit-source-id: 74163a87f919e061e5e75dfebc4c5cdbe8489d93	2021-06-30 01:32:31 -07:00
Philip Meier	44b3dc4eac	resolve conjugate bit in `torch.testing.assert_close` (#60522 ) Summary: We need to resolve the conjugate bit for complex tensors, because otherwise we may not be able to access the imaginary component: ```python >>> torch.tensor(complex(1, 1)).conj().imag RuntimeError: view_as_real doesn't work on unresolved conjugated tensors. To resolve the conjugate tensor so you can view it as real, use self.resolve_conj(); however, be warned that the resulting tensor will NOT alias the original. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60522 Reviewed By: ngimel Differential Revision: D29353095 Pulled By: mruberry fbshipit-source-id: c36eaf883dd55041166f692f7b1d35cd2a34acfb	2021-06-30 01:31:30 -07:00
Hector Yuen	c4cc26f26a	add quantized_resize and dequantize for some cuda backends (#60489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60489 adding entries into native_functions.yaml to enable these functions since the code is common between cuda and cpu Test Plan: tested with a full model, unit tests on the way Reviewed By: ezyang Differential Revision: D29312809 fbshipit-source-id: c5c5eabb98bc192343ec78980dc4e3fc3f41d3db	2021-06-30 00:33:12 -07:00
Hao Lu	4adc5eb6c5	[Caffe2][Testing] Check for equality first in assertTensorEqualsWithType<float> (#61006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61006 Test Plan: Modified existing unit test to test for eps = 0. It would fail without the equality test first. Reviewed By: ajyu Differential Revision: D29423770 fbshipit-source-id: 168e7de00d8522c4b646a8335d0120700915f260	2021-06-29 23:31:37 -07:00
Malay Bag	287c0ab170	[FX] Add requires_grad to TensorMetadata (#60972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60972 For PyTorch model memory requirement calculation, requires_grad is needed. Output tensors with requires_grad are saved in module context and increases memory during forward pass. Test Plan: Existing test cases Reviewed By: jamesr66a Differential Revision: D29024932 fbshipit-source-id: def990f8c6ff6fa4537bfc377c646b9d44464ebd	2021-06-29 23:07:27 -07:00
Michael Melesse	ce232e7847	[ROCM] enable fft tests (#60313 ) Summary: This PR enables fft tests on ROCM. It contains a function that generates a valid input for fft tests that call hipfftExecC2R or hipfftExecZ2D. With this helper function we are able to fix a number of fft tests. This brings a close to the series of fft PRs enabling fft tests on ROCM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60313 Reviewed By: mruberry Differential Revision: D29463487 Pulled By: malfet fbshipit-source-id: d0903fbf12d24ba95a42c8b7589714fdb63353ed	2021-06-29 22:43:29 -07:00
Pruthvi Madugundu	e2b42c6f52	[ROCm] Update the magma build to new commit (#60900 ) Summary: Magma master branch is updated with all the fixes required for ROCm, so updating the magma build to the new commit for ROCm pyTorch builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60900 Reviewed By: jbschlosser Differential Revision: D29440587 Pulled By: malfet fbshipit-source-id: 2ccdf48441dfff3d19c4a478e03ac11a843f8419	2021-06-29 22:38:58 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Bert Maher	c4f718cb72	[nnc] Serialize initialization of LLVM targets (#60996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60996 We've had a bug report of weird LLVM initialization errors, e.g., ``` Unexpected failure in LLVM JIT: Cannot choose between targets "x86-64" and "x86-64" ``` While I haven't repro'ed that exact message, I did run a stress-test that compiles on many threads simultaneously, and it deadlocks in TargetRegistry::lookupTarget. And in fact I remember debugging this before in a different system, and finding "Clients are responsible for avoid race conditions in registration" in https://llvm.org/doxygen/TargetRegistry_8cpp_source.html. So yeah, let's lock this thing. ghstack-source-id: 132719018 Test Plan: Heavy multithreaded compilation. Not sure if it's suitable for landing. Reviewed By: ZolotukhinM Differential Revision: D29471343 fbshipit-source-id: b495e468b57e77796a08b627884d3efeca2d1f7c	2021-06-29 22:21:00 -07:00
Patrick	5bc28c897e	fixed launch bounds for gamma_cuda_kernel (#60393 ) Summary: Changed launch bounds for gamma_cuda_kernel from 512 to 256. Timing data (using Nvidia Titan-V): ![GammaTimingData](https://user-images.githubusercontent.com/22803332/122821464-bc873300-d291-11eb-9be6-2fb690f0d5c7.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60393 Reviewed By: jbschlosser Differential Revision: D29447926 Pulled By: ngimel fbshipit-source-id: c2112f9be8ede3bb07cb72f301393f24d17e0c01	2021-06-29 19:22:07 -07:00
Peter Bell	b3ec92cf66	BatchNorm: Remove dispatch in parallel region (#60596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60596 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446193 Pulled By: ngimel fbshipit-source-id: 3ebf44a5f1e001e7dc42cd5963752b7e5b9bcbd9	2021-06-29 18:28:46 -07:00
Peter Bell	28dc02fe9f	Accumulate 16-bit float sums in 32-bit accumulators (#60387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60387 Fixes gh-59489 Using 32-bit accumulators is a win-win: improved precision and improved performance since the half precision types needed to be converted back and forth to 32-bit float to do the arithmetic anyway. Note that on multi-threaded or dis-contiguous sums, there can be partial sums stored in the output so they are necessarily trucated to 16-bit. Fixing this would require a rework of TensorIterator reductions. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29447187 Pulled By: ngimel fbshipit-source-id: d0619e0ca2fe116d101460142b79ca56fd6d0840	2021-06-29 17:52:30 -07:00
Victor Quach	f54290fd72	Expose raw saved tensors for custom functions (#60551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60551 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29466228 fbshipit-source-id: 7565f6cc3f2488c7e444cf81c7eb37a60c75b0e8	2021-06-29 17:21:52 -07:00
Yi Zhang	a469298707	Free space in windows libtorch build (#60849 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60856 Remove more unless pre-installed softwares in CI image verification links https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14436533 https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14437351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60849 Reviewed By: mruberry Differential Revision: D29473637 Pulled By: seemethere fbshipit-source-id: f33dd98de32a79ba1195481f1bd9f2d5362fe16e	2021-06-29 16:53:10 -07:00
Elton Leander Pinto	af66356d47	[skip-ci] Bump docker image tag (#60988 ) Summary: This PR bumps the docker image tag for clang-tidy. The new image runs ubuntu-20.04 (and therefore has python3.8 by default). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60988 Reviewed By: malfet Differential Revision: D29469941 Pulled By: 1ntEgr8 fbshipit-source-id: 7268bdb23edff0bc26f275689bf4b1f1ca129df7	2021-06-29 15:23:06 -07:00
Howard Huang	8780f8fc3c	Remove extraneous process group agent test code (#60903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60903 RPC tests using process group backend were disabled for CI internally / externally. This is removing the code for process group (only) tests. Faulty agent tests which also use process group will be in a later PR. Test Plan: Imported from OSS Reviewed By: jbschlosser, mrshenli Differential Revision: D29440674 Pulled By: H-Huang fbshipit-source-id: 4724c189a110ac821c3f4f6f1f8a5c98e057a2a4	2021-06-29 14:21:56 -07:00
wuhuikx	d3de37609f	Support fused_dropout with XPU backend (#60231 ) Summary: ## Motivation Enable the fused dropout optimization on XPU devices. ## Solution Add XPU device in the fused dropout acceptable checking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60231 Reviewed By: jbschlosser Differential Revision: D29437659 Pulled By: ezyang fbshipit-source-id: b77245bb53d3ac93ab30a2a85994376ae5928c34	2021-06-29 14:20:17 -07:00
Feng Shi	b4a4a8434d	[1/n]support double for Caffe2 ScatterWeightedSum (#60402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402 Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient. Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum Reviewed By: jianyuh Differential Revision: D29190324 fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729	2021-06-29 14:17:04 -07:00
Elton Leander Pinto	5f51406a51	Modify error message when atol=0 and rtol=0 (#60897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60897 Fixes #56377 Example output: #60898 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461107 Pulled By: 1ntEgr8 fbshipit-source-id: c6e15b299290aab6f8d5a19011c1d39279673f74	2021-06-29 14:17:02 -07:00
Raghavan Raman	6d952dbaf0	[nnc] Fixed checking for loop carried dependence while fusing 2D reduction loops (#60609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60609 Fixes #60310 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29386144 Pulled By: navahgar fbshipit-source-id: 230df4f59d6196a250ea57ff649b117d096fcdbc	2021-06-29 14:17:01 -07:00
Yukio Siraichi	b099f5429c	Port `argmin` kernel to structured kernels. (#60364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29265855 Pulled By: ezyang fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec	2021-06-29 14:16:59 -07:00
Yukio Siraichi	3e2233841f	Port `argmax` to structured kernels. (#60363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60363 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265857 Pulled By: ezyang fbshipit-source-id: 586914d2aa79028c56988896093945755a2b9781	2021-06-29 14:16:57 -07:00
Yukio Siraichi	df47fa5bdc	Using meta checks for unary `torch.all` and `torch.any`. (#60362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60362 This PR makes use of the newly implemented unified `at::meta::check_reduction` for validating the inputs and configuring its `TensorIterator`. This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265858 Pulled By: ezyang fbshipit-source-id: e8961b7da65a31acfed5ac3f5c1f5985ae81ec37	2021-06-29 14:16:56 -07:00
Lily Johnson	0dd90cceaf	[package] track storages across lifetime of PackageExporter (#59735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735 1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue. 2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`) 3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29075276 Pulled By: Lilyjjo fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12	2021-06-29 14:16:54 -07:00
Lily Johnson	eb2f535689	c10::Storage python to cpp converter and typecast (#59734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59734 Adds typecast logic to allow for c10::Storages to cross the Python/C++ barrier with pyBind Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29075279 Pulled By: Lilyjjo fbshipit-source-id: 3e67b8525d308c5bccc64438ebac82b4d17ba462	2021-06-29 14:16:52 -07:00
David Riazati	93eba7471b	Remove fetch in clang-tidy setup (#60974 ) Summary: This was necessary previously since we'd have to diff against upstream in order to figure out what to run in clang-tidy, but now we pull this from GitHub https://github.com/pytorch/pytorch/issues/60045 so we can delete this part of the workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/60974 Reviewed By: mruberry Differential Revision: D29466036 Pulled By: driazati fbshipit-source-id: a9d619ab731e77bc69ab32b37cfb2c249e22a477	2021-06-29 14:15:34 -07:00
Victor Bittorf	91c076eadc	Add TorchVitals for DataLoader (#60959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959 Add TorchVitals for Dataloader, this indicates that the data loader was enabled. This is a no-op if TORCH_VITALS environment variable is not set. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals Reviewed By: VitalyFedyunin Differential Revision: D29445146 fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518	2021-06-29 14:08:32 -07:00
mingfeima	652d911f81	add BFloat16 support for LayerNorm CPU (#55210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55210 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836793 Pulled By: VitalyFedyunin fbshipit-source-id: 998298deedd7a18e45fb761a0a4e0d88b65f2e0c	2021-06-29 14:08:30 -07:00
Serhat Yilmaz	89d0e31fe5	[torch][repeat_interleave] Remove stream sync when output_size is given for scalar repeats (#60965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60965 Same as title. Simple change on tensor creation. Test Plan: Rely on existing signals and verify manually that sync is not happening. Reviewed By: ngimel Differential Revision: D29461773 fbshipit-source-id: 21d6ebfba08449da39fc7f109958f6c6978a4f32	2021-06-29 14:08:28 -07:00
Winston Smith	086f6e557e	Fix divide by zero error in the ASAN test (#60723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60723 Reviewed By: jbschlosser Differential Revision: D29432147 Pulled By: albanD fbshipit-source-id: c82cd0df8e4a04ee561ca26ae821a8b61c13a698	2021-06-29 14:07:26 -07:00
Heitor Schueroff	ec9c03c234	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi closes https://github.com/pytorch/pytorch/issues/19037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: jbschlosser Differential Revision: D29431651 Pulled By: heitorschueroff fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b	2021-06-29 14:02:39 -07:00
Heitor Schueroff	8f658d537d	Improved JIT support for torch.einsum (#59265 ) Summary: Added JIT support for the vararg version of `torch.einsum`. Note that JIT does not support the Python's Ellipsis object (`...`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59265 Reviewed By: VitalyFedyunin Differential Revision: D29328469 Pulled By: heitorschueroff fbshipit-source-id: 5e4b177fda93255251f45d735b00c08220f0f124	2021-06-29 14:01:21 -07:00
Edgar Andrés Margffoy Tuay	d46eb77b04	Improve CUDA extension building error/warning messages (#59665 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59665 Reviewed By: mruberry Differential Revision: D29462248 Pulled By: ezyang fbshipit-source-id: 9de13a284a14a7cd24200b9684151ce652e1eb1e	2021-06-29 13:03:30 -07:00
Rohan Varma	12b63f4046	[DDP] Fix case where new tensors with no grad_fn are returned in DDP forward. (#60882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60882 Fixes https://github.com/pytorch/pytorch/issues/60733, which identified an issue with a previous PR that resulted in DDP no longer supporting cases where newly created tensors are returned that don't have a grad_fn. The result of this is the grad_fn is set to that of the `DDPSink` custom backward which results in errors during the backwards pass. This PR fixes the issue by ensuring we don't touch the `grad_fn` of the tensors if it is `None`. Added relevant tests as well. ghstack-source-id: 132632515 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29423822 fbshipit-source-id: a9e01046c7be50aa43ffb955f6e0f48fef4bc881	2021-06-29 12:50:48 -07:00
Rohan Varma	1db2d9b0a8	[ProcessGroupNCCL] change WARNING to INFO (#60901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60901 Short-term fix to address https://github.com/pytorch/pytorch/issues/60752 . Longer-term fix is tracked here: https://github.com/pytorch/pytorch/issues/53658 and will involve detecting whether the user has called `torch.cuda.set_device` in their script and respecting that device if so, otherwise falling back to our current approach. ghstack-source-id: 132637336 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29439322 fbshipit-source-id: 92a18fadbb514b1c029332b60fd48075874906ff	2021-06-29 12:46:47 -07:00
Ruslan Semenov	150c828803	Add lint rule to keep collect_env.py python2 compliant (#60946 ) Summary: Fixes T94400857 - [x] Add lint rule - [x] Verify lint rule works - [x] Fix torch/utils/collect_env.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946 Reviewed By: malfet, mruberry Differential Revision: D29457294 Pulled By: rsemenov fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9	2021-06-29 11:57:53 -07:00
Adam Simpkins	808d0e3353	[caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919 Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). This is similar to the changes to make_cifar_db.cc landed in D29374754 (`394f60b0fc`). ghstack-source-id: 132621346 Test Plan: buck build caffe2/binaries/... Reviewed By: valmikir Differential Revision: D29447314 fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726	2021-06-29 11:52:43 -07:00
Eli Uriegas	fab1b6cc70	.github: Increase test shards for linux GPU (#60914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60914 Linux GPU tests are taking almost 4 hours to execute through, let's up the test shards for these jobs so they finish in a more timely fashion Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29461968 Pulled By: seemethere fbshipit-source-id: a1eab08f9cd3abd8ceca48871fe702d0bccd8a3f	2021-06-29 10:44:01 -07:00
driazati	5fbca0d281	Use cpu docker image for cpu builds (#60920 ) Summary: This was set to use the [CUDA 10.0 image](https://hub.docker.com/r/pytorch/manylinux-cuda100) which hasn't been updated in quite a while, so fix it to use the up-to-date [cpu image](https://hub.docker.com/r/pytorch/manylinux-cpu) instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/60920 Reviewed By: janeyx99 Differential Revision: D29447897 Pulled By: driazati fbshipit-source-id: 6e89091110361d0ddda859bb266e229c6cf83c2d	2021-06-29 10:11:55 -07:00
Sam Estep	10b929bbfb	Make Jeff and Jithun .circleci/docker code owners (#60958 ) Summary: Following up on https://github.com/pytorch/pytorch/pull/60658#issuecomment-870681027. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60958 Reviewed By: 1ntEgr8 Differential Revision: D29460721 Pulled By: samestep fbshipit-source-id: 74badff6c4a17b3ff48dc2fc27d1faa9edeae097	2021-06-29 09:47:58 -07:00
DamonDeng	53489bc385	fix for #60319 , forcing to use fork as start method in test/test_dat… (#60868 ) Summary: fix for https://github.com/pytorch/pytorch/issues/60319 , forcing to use fork as start method in test/test_dataloader.py Fixes #{60319} Pull Request resolved: https://github.com/pytorch/pytorch/pull/60868 Reviewed By: mruberry Differential Revision: D29432876 Pulled By: ejguan fbshipit-source-id: 5da25f7cfaf8ea0803c0b1aacf2badd656799e16	2021-06-29 09:30:37 -07:00
Karen Zhou	4310044fec	update `unsafe` flag documentation (#60899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60899 modify documentation for `unsafe` flag in `parametrize.py` ghstack-source-id: 132591862 Test Plan: shouldn't modify code behavior but as a double check, `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L1fw Reviewed By: albanD Differential Revision: D29436688 fbshipit-source-id: 85499ad22b49ad992507b9ed5e7def8231cbfeba	2021-06-29 09:25:37 -07:00
Yi Wang	5b6818f08a	[Model Averaging] Enforce a synchronization before allreduce parameters (#60891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60891 This fix is particularly useful for local SGD when the averaging period is very small, which may cause the conflict between gradient allreduce within per-machine subgroup and the global parameter allreduce by the communication world. ghstack-source-id: 132564252 Test Plan: f281873295 (#Try1) failed due to the conflict between global process group and subgroup. ``` <Thread(configerator-monitor-singleton, started 139839806633728)> File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 870, in run self._target(self._args, *self._kwargs) File "/tmp/jetter.gson7tr3/configerator/client.py", line 348, in _monitor_loop self._parent_thread.join(self._interval_ms / 1000) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1015, in join self._wait_for_tstate_lock(timeout=max(timeout, 0)) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock elif lock.acquire(block, timeout): ``` Fixed after adding an explicit sync: f282044866, f282241800 Reviewed By: rohan-varma Differential Revision: D29434597 fbshipit-source-id: a4f777fc26f379639f85fda32de425cd3b337b33	2021-06-29 01:39:40 -07:00
Pritam Damania	fbd4cb1cd7	Fix error logging in common_distributed. (#60917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60917 The second line of error log didn't handle f-string properly. Before fix: ``` exiting process with exit code: {MultiProcessTestCase.TEST_ERROR_EXIT_CODE} ``` After fix: ``` exiting process 3 with exit code: 10 ``` ghstack-source-id: 132618199 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D29446574 fbshipit-source-id: f806ef0470cb6aa86fe3c404e1c895514abb6488	2021-06-28 19:32:17 -07:00
Scott Wolchok	d71e7ae740	[PyTorch][vulkan] Unify vtensor_from_vulkan to always return non-const ref (#59996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59996 Just like D28811477 (`dce8697aea`), there's no reason we can't give it this signature. ghstack-source-id: 132566618 Test Plan: CI Reviewed By: AshkanAliabadi Differential Revision: D29119070 fbshipit-source-id: d049d49c38099eef6c96e8f69909827e64376097	2021-06-28 19:25:13 -07:00
Patrick	7eef78597e	fixed launch bounds for grid sampler 3d (#60385 ) Summary: Changed launch bounds for grid_sampler_3d from 1024 to 512 and grid_sampler_3d_backward from 1024 to 256. Timing data (using Nvidia Titan-V): ![GridSampler3dTimingData](https://user-images.githubusercontent.com/22803332/122813457-d3c12300-d287-11eb-99c1-6572f539660f.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60385 Reviewed By: jbschlosser Differential Revision: D29433741 Pulled By: ngimel fbshipit-source-id: 7f475d0c2e854ae65dd0f1fb0167dfae7e506ec9	2021-06-28 19:01:38 -07:00
Jeff Daily	d36ce61a5e	use explicitly non-returning GPU atomics (#60607 ) Summary: Enables an important performance optimization for ROCm, in light of the discussion in https://github.com/pytorch/pytorch/issues/41028. CC jithunnair-amd sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60607 Reviewed By: jbschlosser Differential Revision: D29409894 Pulled By: ngimel fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b	2021-06-28 18:17:29 -07:00
Sam Estep	d62c3ea354	[skip ci] Add GitHub Actions label for g3.16xlarge (#60888 ) Summary: Prerequisite for https://github.com/pytorch/pytorch/issues/60221. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60888 Reviewed By: seemethere Differential Revision: D29436592 Pulled By: samestep fbshipit-source-id: b3254139ec9c46c533f8f951a9ede3b372a65536	2021-06-28 15:49:52 -07:00
Sam Estep	d5a44f9f12	Use expecttest from PyPI (#60658 ) Summary: This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658 Test Plan: CI. Reviewed By: ezyang Differential Revision: D29430763 Pulled By: samestep fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e	2021-06-28 15:43:34 -07:00
Bert Maher	ddb1f293b6	Fix the NNC-disabled path in static runtime for perf comparisons Summary: The path which has NNC/LLVM disabled still constructs a tensor expression, even though `supports()` will always return false, so a `KernelScope` is necessary to manage those memory allocations. I guess we could avoid building the TEs at all in this case, but it's pretty clean this way. Test Plan: ``` scripts/bertrand/static_runtime/run.sh ``` Reviewed By: hlu1 Differential Revision: D29415909 fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30	2021-06-28 15:39:07 -07:00
Angela Yi	9b94aa5356	[quant][fx][fix] Fused modules with object_type in qconfig (#60779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779 When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v). So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error. Test Plan: `python test/test_quantization.py TestFuseFx.test_qconfig_fused_module` Imported from OSS Reviewed By: supriyar Differential Revision: D29406941 fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654	2021-06-28 15:22:22 -07:00
cyy	cadce14e02	don't return in __init__ functions (#60830 ) Summary: Fix some warnings from a code analyzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/60830 Reviewed By: jbschlosser Differential Revision: D29433638 Pulled By: albanD fbshipit-source-id: 148df1d8a0a79778f18e8b6abffbddef36c5031c	2021-06-28 14:56:13 -07:00
Andrew Gallagher	9af8aecd00	[caffe2/libtorch] Remove already-owned source Summary: This source is already owned by a more fine-grained rule, so avoid a package boundary violation by having it also be owned by an outer rule. Test Plan: CI Reviewed By: aniketmathur Differential Revision: D29422794 fbshipit-source-id: 432accc969abcb4d56bd97341a07029926939ea0	2021-06-28 14:45:34 -07:00
Andrew Gallagher	eeea696c02	[caffe2] Fix include of corresponding header Summary: AFAICT, this include was a typo, and meant to be the corresponding header for this .cpp, but instead pulled in an unrelated header. Test Plan: CI Reviewed By: igorsugak Differential Revision: D29422993 fbshipit-source-id: cc9bb29ee1f1007b68c6666ea8e389f6f39928af	2021-06-28 14:45:32 -07:00
Andrew Gallagher	c3977bf3da	[caffe2/utils] Add some fine-grained rules to avoid package boundary violations Test Plan: CI Reviewed By: igorsugak Differential Revision: D29401295 fbshipit-source-id: e921e5578c1fcc8df6bd670ae9f95722b8e32d85	2021-06-28 14:45:30 -07:00
Andrew Gallagher	03de807d81	[caffe2/utils] Add explicit rule to avoid package boundary violation (#60677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677 Add a rule to wrap conversions.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: mzlee Differential Revision: D29370841 fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647	2021-06-28 14:43:30 -07:00
Sam Estep	41c380e649	Enable bionic-cuda10.2-cudnn7-py3.9-gcc7 in GHA (#60204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60204 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29430679 Pulled By: samestep fbshipit-source-id: 9380f5535cd370ec7aabf609a6170c8cb4df505d	2021-06-28 13:08:36 -07:00
David Reiss	971cdafd15	Upgrade benchmark to v1.5.5 (#60750 ) Summary: This fixes the build for gcc 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60750 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D29394541 Pulled By: dreiss fbshipit-source-id: 61557431b52a3e898ffcc32f97133b3ea94a838f	2021-06-28 13:03:03 -07:00
Karen Zhou	007ba37c9a	[pruning] Speedup activation reconstruction (#60683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60683 Vectorized reconstruction without for loops Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KSJQ Reviewed By: z-a-f Differential Revision: D29370805 fbshipit-source-id: 75402437654a0b6f6391c8590bbe3f6fe3f43d8f	2021-06-28 12:58:21 -07:00
Karen Zhou	f302e0c781	[pruning] Additional pruning tests (#60681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60681 Adding additional pruning tests for more complex models and more pruned rows Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KQ2Z Reviewed By: z-a-f Differential Revision: D29347546 fbshipit-source-id: cb65e564dd46d24f4aca1b00dd915ee8d64f8318	2021-06-28 12:58:20 -07:00
Karen Zhou	8d4a6ef962	[pruning] Activation reconstruction (#60292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60292 Added activation reconstruction in the `reconstruct` method Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KLl1 Reviewed By: z-a-f Differential Revision: D29236569 fbshipit-source-id: 1ad085f4143eb9fa3efca51e00d810e0fdb7e9b1	2021-06-28 12:58:18 -07:00
Karen Zhou	965dad25a5	Allow resizing of parametrized tensors (#60418 ) Summary: Modify `parametrize.py` to allow resizing of parametrized tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/60418 Test Plan: `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L0wh Reviewed By: z-a-f Differential Revision: D29279442 Pulled By: kazhou fbshipit-source-id: 4d94915748f896e7761a40ad18f4c6444f505c3a	2021-06-28 12:57:11 -07:00
kshitij12345	956faea585	[fix] cauchy sampling inf on cuda (#60186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59144 As pointed by ngimel, the issue is indeed with calling `tan`. However the C++ `std::tan` [documenation](https://en.cppreference.com/w/cpp/numeric/math/tan) states that ``` The function has mathematical poles at π(1/2 + n); however no common floating-point representation is able to represent π/2 exactly, thus there is no value of the argument for which a pole error occurs. ``` All `torch.tan`,`numpy.tan` and `math.tan` are compliant with the above statement. <details> ```python import torch import math import numpy as np # Single Precision print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.float32) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float32) * 0.5)) # Double Precision print(math.tan(math.pi * 0.5)) print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.double) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float64) * 0.5)) ``` Output ``` tensor(-22877334., device='cuda:0') -22877332.42885646 1.633123935319537e+16 tensor(1.6331e+16, device='cuda:0', dtype=torch.float64) 1.633123935319537e+16 ``` </details> So this issue stems from the use of `__tanf` faster approximation of tan from CUDA library (for float16, bfloat16 and float). `8a839c5478/aten/src/ATen/NumericUtils.h (L91-L100)` The fix in the PR is to use the slower but more correct version. Benchmark:: ``` [ cauchy : input dtype torch.float16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.2 (2, 512, 256) \| 3.8 \| 4.2 (2, 64, 256, 128) \| 22.8 \| 29.6 (4, 2, 512, 256, 128) \| 649.6 \| 869.3 Times are in microseconds (us). [ cauchy : input dtype torch.bfloat16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.3 (2, 512, 256) \| 3.8 \| 4.3 (2, 64, 256, 128) \| 23.8 \| 30.8 (4, 2, 512, 256, 128) \| 682.5 \| 904.2 Times are in microseconds (us). [ cauchy : input dtype torch.float32 device cuda ] \| Before \| After 1 threads: -------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 3.7 \| 4.2 (2, 512, 256) \| 3.7 \| 4.2 (2, 64, 256, 128) \| 35.3 \| 37.1 (4, 2, 512, 256, 128) \| 1020.0 \| 1058.3 Times are in microseconds (us). [- cauchy : input dtype torch.float64 device cuda ] \| Before \| After 1 threads: ---------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 8.0 \| 8.0 (2, 512, 256) \| 46.0 \| 46.0 (2, 64, 256, 128) \| 669.2 \| 669.4 (4, 2, 512, 256, 128) \| 21255.0 \| 21262.1 Times are in microseconds (us). ``` <details> Benchmark Script: ```python import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle print('Using pytorch %s' % (torch.__version__)) cuda_shapes = [(128,), (256, 128), (2, 512, 256), (2, 64, 256, 128), (4, 2, 512, 256, 128)] cuda_dtypes = [torch.half, torch.bfloat16, torch.float, torch.double] results = [] repeats = 10 for device in ['cuda']: dtypes = cuda_dtypes shapes = cuda_shapes for dtype in dtypes: for shape in shapes: t = torch.randn(shape, device=device, dtype=dtype) * 10 tasks = [("t.cauchy_()", "After", "")] timers = [Timer(stmt=stmt, label=f"cauchy : input dtype {dtype} device {device}", sub_label=f"{(shape)}", description=desc, globals=globals()) for stmt, desc, label in tasks] for i, timer in enumerate(timers * repeats): results.append( timer.blocked_autorange() ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() with open('after-pr.pkl', 'wb') as f: pickle.dump(results, f) comparison = Compare(results) comparison.print() ``` Compare Script: ``` import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle with open('before-pr.pkl', 'rb') as f: after_results = pickle.load(f) with open('after-pr.pkl', 'rb') as f: before_results = pickle.load(f) comparison = Compare(after_results + before_results) comparison.print() ``` </details> TODO: * [x] Add comment Pull Request resolved: https://github.com/pytorch/pytorch/pull/60186 Reviewed By: jbschlosser Differential Revision: D29433897 Pulled By: ngimel fbshipit-source-id: 9c5f14b83e3372bed72369f70eed9256c04385c6	2021-06-28 12:49:30 -07:00
David Riazati	70e205a2ab	Use the new URL for docs preview link (#60893 ) Summary: This is all set up on CloudFront now with a custom domain, so we don't need the long default cloudfront domain anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/60893 Reviewed By: malfet Differential Revision: D29437300 Pulled By: driazati fbshipit-source-id: 6f5ffd1b10c5167b0022b7e64b2164508624ca91	2021-06-28 12:45:04 -07:00
Elton Leander Pinto	f5e5ced202	Enable parallel clang-tidy on ec2 runner (#60870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60870 This PR makes `clang-tidy` run on our self-hosted runner in a parallel fashion. Fixes #60867 Test Plan: #60871 Reviewed By: jbschlosser Differential Revision: D29434240 Pulled By: 1ntEgr8 fbshipit-source-id: cead30ed718ddf5e14b13afe70cb209aa16b44a0	2021-06-28 11:45:44 -07:00
Elton Leander Pinto	c8fb785857	Print stdout and stderr to console on parallel runs (#60869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60869 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29434155 Pulled By: 1ntEgr8 fbshipit-source-id: 925c9d832775dbb710af9367c07962f3367fda38	2021-06-28 11:29:12 -07:00
Jeff Yang	a8057e7ef1	docs: add `permute` in torch docs (#60821 ) Summary: fix https://github.com/pytorch/pytorch/issues/60181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60821 Reviewed By: VitalyFedyunin Differential Revision: D29431949 Pulled By: jbschlosser fbshipit-source-id: 2353afceaa188315cde1f0c955897c4750809c8e	2021-06-28 11:20:35 -07:00
Han-Hsien Huang	d7c58e5a04	[vulkan] Implement tanh activation function (#60695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60695 As title. Implement tanh in Vulkan. Test Plan: Build Pytorch repository with the build command in P425131222. Run test command `pytorch/build/bin/vulkan_api_test` Output: {F627752306} Reviewed By: SS-JIA Differential Revision: D29375071 fbshipit-source-id: 2d613a9542774719dd78524757a677e3b2450c74	2021-06-28 10:58:44 -07:00
Angela Yi	da70dd199d	[quant] Input-Weight Equalization - tests (#60378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60378 Created the following unit-tests to check that our equalization algorithm is as expected: - Check the equalization scales calculated and stored in the graph are as expected - Check the scaled weights and biases are as expected - Check that the min/max values in the quantization observers are as expected - Check that the graphs with equalization are structured in the same way as graphs without equalization (except that equalized graphs have additional equalization scale and mul nodes) before and after quantization Test Plan: `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_equalization_scales` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_weights_bias` `python test/test_quantization TestEqualizeFx.test_input_activation_values` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_graphs` Imported from OSS Reviewed By: supriyar Differential Revision: D29406942 fbshipit-source-id: 518208546ae5835c1ebb2af217507e90af66fbe4	2021-06-28 10:44:29 -07:00
Angela Yi	dfb9c0bae8	[quant] Input-Weight Equalization - support for connected F.linear layer (#60272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272 Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` FunctionalLinear2Module( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after equalization steps: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0] %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0] %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0] %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29267218 fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7	2021-06-28 10:44:27 -07:00
Angela Yi	ddf2ce03bb	[quant] Input-Weight Equalization - support for connected linear layers (#60034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034 Added support for equalizing models with connected linear layers. To account for connected linear layers, we will additionally multiply the previous weight values (row-wise) by the next equalization scale, and remove the input equalization observer between the two linear layers. We also want to scale the bias by the next equalization scale. The math is shown here: https://fb.quip.com/fK8rA9aRM4ca . Original Model: `x -> linear1 -> linear2` After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 -> OutQuantObs -> InpEqObs -> linear2` After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs -> linear2` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` Linear2Module( (linear1): Linear(in_features=2, out_features=2, bias=True) (linear2): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after equaliation functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29204347 fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1	2021-06-28 10:44:25 -07:00
Angela Yi	7917318917	[quant] Input-Weight Equalization - support for F.linear layers (#59964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964 Input-Weight Equalization support for functional layers Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original model: ``` FunctionalLinearModule( (linear1): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135459 fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde	2021-06-28 10:44:24 -07:00
joerg-de	387289d4a5	support non-contiguous tensor in bilinear (#38409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38409 Reviewed By: anjali411 Differential Revision: D29361043 Pulled By: albanD fbshipit-source-id: 05147a9b0f7a47204bcd5ff70e281a464e8de1e6	2021-06-28 10:43:21 -07:00
albanD	f118d20bea	Make requires grad check run only when grad mode is enabled (#60740 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60740 Reviewed By: ngimel Differential Revision: D29405934 Pulled By: albanD fbshipit-source-id: 35c537939a3871f5a0d2146543506e4d07465724	2021-06-28 10:40:30 -07:00
Edward Yang	3ad3f20bff	Add an optional Device parameter to pin_memory/is_pinned that does nothing (#60201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60201 This is to flush out BC/FC problems with adding this parameter. Later PR will actually add the desired functionality. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29331880 Pulled By: ezyang fbshipit-source-id: 6036716d6ae55e6ea7ef2348b6c34a39613c8dd5	2021-06-28 10:38:52 -07:00
Edward Yang	85af24f52b	Remove some unnecessary functions from CUDAHooks (#59655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59655 CUDAHooks is to be used solely when you need to call into CUDA functionality from a context where you cannot directly link to CUDA libraries. Neither of hasPrimaryContext nor getDevceIndexWithPrimaryContext (sic) needs to be used in such contexts. By moving them out of CUDAHooks and calling them directly a dynamic dispatch can be skipped. I also fixed the typo in getDev(i)ceIndexWithPrimaryContext Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28972946 Pulled By: ezyang fbshipit-source-id: edcd7a7b62aec97928f07fbf3bf413b9fb027517	2021-06-28 10:38:51 -07:00
Freey0	b52849b589	Port silu_backward to structured (#58661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58661 I removed dispatch: CompositeImplicitAutograd: math_silu_backward Definitely not right, but I don't know how it works with structured core. Keeping it will trigger an assertion failure ``` assert dispatch.keys() != {DispatchKey.CompositeImplicitAutograd}, \ f"unexpected name for singleton CompositeImplicitAutograd dispatch entry: expected {cpp.name(func)} " \ f"but got {dispatch[DispatchKey.CompositeImplicitAutograd]}. Rename your implementation to the expected " \ "name, then delete the dispatch table" ``` Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572530 Pulled By: ezyang fbshipit-source-id: 410f03bddf79cda7c9f0fd66f697383ee2925d32	2021-06-28 10:37:45 -07:00
Richard Barnes	66f01db36c	Make some comparisons explicit (#60505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60505 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29313240 fbshipit-source-id: 3f558e68cbb0328326d7540e2b3bd0c2e12ba3e2	2021-06-28 10:33:59 -07:00
Rohan Varma	f5341bd5e6	Enhance ProcessGroupWrapper with additional checks + refactor (#60237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60237 Closes https://github.com/pytorch/pytorch/issues/58711 This diff refactors the collective consistency checking in `ProcessGroupWrapper` as described in the above issue. In particular, we no longer run separate verification checks (`all_gather`s) for shapes, op type, etc. Instead, we implement a function `serialize_fingerprint` to serialize all this data into a single tensor and only verify that. This has the benefit of being a lot more extensible, the developer does not need to add separate `all_gather` calls in order to verify additional data in the future. We can also provide some sort of mechanism where we allow data that needs to be verified to be "registered" in the `CollectiveFingerPrint` struct and make it even easier to add additional data, we can consider doing this if there are significant additions to `process group wrapper`. We now also begin to check tensor `dtypes` and device types for consistency as well. Tests are refactored/added accordingly. ghstack-source-id: 132520261 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28597287 fbshipit-source-id: b09f14f628df9e2457623ba81fc13fd4e214f3c9	2021-06-28 10:24:11 -07:00
Kiuk Chung	aaea81e3fb	[torch/distributed] remove outdated FutureWarning in distributed/elastic/util/store.py (#60807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60807 Addresses: https://github.com/pytorch/pytorch/issues/60717 This warning should have been removed since this code is no longer in "experimental" mode. Test Plan: N/A - just removing experimental warning that should've been removed. Reviewed By: H-Huang, aivanou Differential Revision: D29412972 fbshipit-source-id: 16a8a98abde70a4ae0c1ac1b14bda339cb44863a	2021-06-28 10:22:16 -07:00
Richard Barnes	94cdbbf48d	Paren-matching kernel launch check without external deps (#60778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60778 Matches parens and the opening `<<<` to make a more accurate kernel launch check. Test Plan: ``` buck test //caffe2/test:kernel_launch_checks ``` Reviewed By: ngimel Differential Revision: D29401624 fbshipit-source-id: 8649af7c33e67dbb24044af0134b1cea6f2e5dc3	2021-06-28 10:18:04 -07:00
Amy He	88b0518a83	Python error unit tests on delegation of backend_with_compiler_demo (#60689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60689 Added a test for errors that occur with a compiler, specifically when an operator is not supported by the backend. ghstack-source-id: 132485207 Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v returns a success. Imported from OSS Reviewed By: iseeyuan Differential Revision: D29374513 fbshipit-source-id: ac52b315a01719eaa4985680939239ae058d277b	2021-06-28 09:33:03 -07:00
Thomas J. Fan	e63db3ae46	ENH Adds byte support for nll_loss (CUDA) (#60650 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60650 Reviewed By: albanD Differential Revision: D29429456 Pulled By: jbschlosser fbshipit-source-id: 894c969ed6bfc6117dee8e844a7cb5b99977247c	2021-06-28 08:20:13 -07:00
Elton Leander Pinto	7f6b2bc2d0	Add -I<directory> option to tools/linter/clang_tidy.py (#60745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60745 Fixes #60739 Test Plan: Run this command: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx -I/usr/include/path -I/usr/include/another/path --print-include-paths ``` Output: If the paths don't exist, you should see this: ``` ignoring nonexistent directory "/usr/include/path" ignoring nonexistent directory "/usr/include/another/path" ``` If the paths exist, you should see them listed. Reviewed By: ngimel Differential Revision: D29395227 Pulled By: 1ntEgr8 fbshipit-source-id: c89650546d45887cd39e574da07f08bcfec686e0	2021-06-28 06:56:02 -07:00
Natalia Gimelshein	5b118a7f23	Don't reference reflection_pad3d in functional.py (#60837 ) Summary: To work around FC issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/60837 Reviewed By: jbschlosser Differential Revision: D29421142 Pulled By: ngimel fbshipit-source-id: f5c1d9c324173b628e286f9005edf7109162066f	2021-06-27 20:54:32 -07:00
Ilqar Ramazanli	f0e972a481	To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch. Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat. It has been one of the most used algorithm in Deep Learning community. It worth to noting that the implementation of NAdam is inspired by the implementation for Keras : `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165 Reviewed By: vincentqb Differential Revision: D29360577 Pulled By: iramazanli fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5	2021-06-27 17:00:41 -07:00
Mikhail Zolotukhin	3bfe15085d	[TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804 The lowerings are stored as a map c10::Symbol -> std::function and the signature of thoese functions match the signature of `computeOperandValue`. Custom lowerings have higher priority over the standard ones, i.e. we can redefine how a given op is lowered. In general this feature is aimed at unblocking users whose models contain ops that are not yet supported by NNC - it allows to quickly add a custom lowering for a given op. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29409580 Pulled By: ZolotukhinM fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60	2021-06-27 15:27:22 -07:00
Ilqar Ramazanli	5563f4bda0	To add Rectified Adam algorithm for multi-tensor optimizers API (#59161 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b	2021-06-27 13:01:20 -07:00
Ansley Ussery	0fbc471d10	Support default values on NamedTuple fields (#54682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54682 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27327241 Pulled By: ansley fbshipit-source-id: 76546f1770d50ebc3435bba3b74540e3c6be8a1c	2021-06-26 15:18:21 -07:00
Rong Rong (AI Infra)	6b53792f18	fix cuda mem leak check not properly run on master_builds (#60742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60742 improved CI_MASTER flag check logic, since it can be unset, true or false Test Plan: search for `PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK` in logs below: - Before adding ci/master: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14394913/output/107/0?file=true&allocation-id=60d5fd2fa55ae50282aec997-0-build%2F10295B30 - After adding ci/master label: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398213/output/107/0?file=true&allocation-id=60d61cf8bb9d097afc7a11aa-0-build%2F400138F1 - master build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398198/output/107/0?file=true&allocation-id=60d61ca3467438480c963290-0-build%2F2999C909 Reviewed By: ngimel Differential Revision: D29405732 Pulled By: walterddr fbshipit-source-id: 09dd653cbb47ca61b1f8872851bda6db8db671b9	2021-06-26 07:05:32 -07:00
Hao Lu	e3abccec8a	[Static Runtime] Remove output type constraints (#60669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60669 Test Plan: Added unit test to check for nested outputs. Reviewed By: ajyu Differential Revision: D29322025 fbshipit-source-id: a3c8d3c5f0bb7cf7fda4bc5f579adb8fa7bc3724	2021-06-26 02:36:27 -07:00
Takeshi Watanabe	dae25c2002	Fix missing spaces in error of constant_pad_nd (#60729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60729 Reviewed By: ZolotukhinM Differential Revision: D29404422 Pulled By: ngimel fbshipit-source-id: c40458c7a6ae33f61c680bff8de778a80658c250	2021-06-25 19:20:03 -07:00
Richard Barnes	9a08e87d8b	Modernize for-loops in aten (#59598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59598 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28946826 fbshipit-source-id: 9f3f7e38833c2bc33d27243cef16ab0118c65f3a	2021-06-25 19:02:00 -07:00
Xiong Wei	7e3a694b23	supports non-leaf inputs for autograd.backward() function (#60521 ) Summary: Close https://github.com/pytorch/pytorch/issues/60268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521 Reviewed By: ngimel Differential Revision: D29393586 Pulled By: albanD fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c	2021-06-25 18:57:26 -07:00
albanD	056a8e0d5c	Remove un-used parameter in _trilinear backward (#60673 ) Summary: This argument is only important for speed and memory usage. So it is ok to ignore it during the backward. As discussed, we might want to change this to speed up backward in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673 Reviewed By: soulitzer Differential Revision: D29370125 Pulled By: albanD fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71	2021-06-25 17:47:10 -07:00
Yi Wang	f262217101	[Model Averaging] Move step out of model averaging API (#60632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60632 Address the comment https://github.com/pytorch/pytorch/pull/60320#discussion_r654845062 ghstack-source-id: 132340278 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29355609 fbshipit-source-id: 50a6f13ed70b5a5b5b92ead2f3d7082c11277af5	2021-06-25 17:20:52 -07:00
Ivan Yashchuk	c5f0692b6e	Sparse CSR: increase dtype test coverage (#60656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60656 This PR uses `torch.testing.get_all_dtypes()` for dtype parametrisation of tests in `test_sparse_csr.py`. It adds previously excluded from tests bool, half, bfloat16, complex dtypes. `torch.complex32` is omitted due to lack of coverage and lack of specialized `AT_DISPATCH...`. The process of adding more dtypes to tests releaved that `.to_dense()` doesn't work for all dtypes. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29408058 Pulled By: cpuhrsch fbshipit-source-id: 319b6f51b9786d6957d508f51657657a6d00267a	2021-06-25 17:11:21 -07:00
mingfeima	dd045ab540	add channels last for AdapativeMaxPool2d (#48920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25399467 Pulled By: VitalyFedyunin fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89	2021-06-25 16:36:20 -07:00
Will Constable	367aff91d8	Fix missing #pragma once in jit/method.h Summary: it seems to be accidentally missing Test Plan: run CI Reviewed By: suo Differential Revision: D29335990 fbshipit-source-id: 2790bc10d141f9484a0807ff7800024a02fd9cfa	2021-06-25 16:32:54 -07:00
Victor Bittorf	8b6487c650	Add CUDA Vital (#58059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059 Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created. Also adds the following features: - Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution - Add a read_vitals call for python to read existing vital signs. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals Reviewed By: xuzhao9 Differential Revision: D28357615 fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252	2021-06-25 16:31:11 -07:00
Brian Hirsh	9134b0e42f	add a boxed CPU fallback kernel (#58065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58065 This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback. Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests. ### Design To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes: * Confirm whether or not we can remove all C++ logging info directly in the yaml. Current Design All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding [xla-side PR with the xla changes](https://github.com/pytorch/xla/pull/2945/files#diff-1a005c10039f0cb11130a3b740f5de716d2f10acaea121017016025861886798R1). There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels. ``` // xla_cpu_fallback.h #include <ATen/native/CPUFallback.h> ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack); ... ``` ``` // xla_cpu_fallback.cpp #include "torch_xla/csrc/aten_cpu_fallback.h" ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { // Do custom logging here ... // Call the actual boxed CPU fallback. at::native::cpu_fallback(op, stack); } TORCH_LIBRARY_IMPL(_, XLA, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>()); } ``` Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.: ``` #include <ATen/native/CPUFallback.h> at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha); } ... } ``` That `decltype(at::addmm)` logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands. Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer? We could change the api to use `at::redispatch`, which would make it look something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha); } ... } ``` Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though! Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha); } ... } ``` Writing that out actually I actually like it more (I think it'll let us get rid of `decltype(...)`). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out. More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides: * Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback. * Passing custom C++ logging through yaml is just more fragile: right now xla uses an `iostream` to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later. To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated `out` wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since `out` wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback. ### Performance impact While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer. I ran my benchmarks using callgrind, benchmarking both `at::add` and `at::add_out` run on XLA. My callgrind benchmark for `at::add` can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind. I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the `at::add()` call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does. `at::add`: before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001 after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273 delta: ~15.5% increase `at::add_out`: before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227 delta: ~14.5% increase High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case. For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a `CompositeExplicitAutograd` kernel which calls into the `out` operator. So the extra work that we end up doing is: * An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add) * An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback) * An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel). * unboxing->boxing->unboxing logic (this is the only strictly required piece) There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's [an issue for it here](https://github.com/pytorch/pytorch/issues/55104)), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later. Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (`at::to_cpu` takes up a ton of instructions, but I don't see any attribution for the `at::native::add` kernel anywhere). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28833085 Pulled By: bdhirsh fbshipit-source-id: 537ebd5d7fb5858f1158764ff47132d503c3b92b	2021-06-25 16:26:50 -07:00
Hongbo Zhang	ad69e2fd11	[torch] Module fix on the support of LazyModule on bug #60132 (#60517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517 This is to fix the module support on lazymodulefixin on the bug issue #60132 Check the link: https://github.com/pytorch/pytorch/issues/60132 We will have to update lazy_extension given the dependency on module.py and update the unit test as well. Test Plan: Unit test passes torchrec test passes Reviewed By: albanD Differential Revision: D29274068 fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980	2021-06-25 16:20:19 -07:00
Basil Hosmer	cab926b2c0	faster generate_square_subsequent_mask in nn.Transformer (#60631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631 Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small. PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around. Test Plan: Imported from OSS Reviewed By: jbschlosser, albanD Differential Revision: D29356673 Pulled By: bhosmer fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e	2021-06-25 16:07:01 -07:00
Ansley Ussery	7585783b8d	Remove `Optional[None]` annotations (#60704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60704 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29380281 Pulled By: ansley fbshipit-source-id: 055c17329a35375de4ebd058ee6d127475aad373	2021-06-25 15:53:58 -07:00
David Riazati	5ed7400b75	Fix doc preview source directory (#60792 ) Summary: `merge` is the directory with the actual changes, not `master`. Verified by downloading arficats from https://github.com/pytorch/pytorch/pull/60777/checks and searching through the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60792 Reviewed By: walterddr Differential Revision: D29405288 Pulled By: driazati fbshipit-source-id: 419c943727c00429945c1f116645bfa22fb12456	2021-06-25 15:46:30 -07:00
Basil Hosmer	7b933cd9ea	configurable pre/post LayerNorm in nn.Transformer (#60593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60593 Per #55270, this PR makes it configurable whether to run LayerNorm before or after other operations in Transformer layers. However, it leaves for a separate PR the removal of the LayerNorm performed after the final encoder/decoder layer has run, which is redundant when LayerNorms has been run after other in-layer operations (problem described in #24930 #50086 #51447). Note: this means that transformers built with `nn.Transformer()` are now configurable, but will still contain a redundant LayerNorm when configured as before. However, callers of the `TransformerEncoder` and `TransformerDecoder` classes have always been able to avoid this redundancy. Reviewer notes: 1. Ran across this during other work, don't know if anybody's working on it already (most recent conversation in issues seems to be from early April). Happy to abandon if so. 2. Was looking for a quick way to add tests but it looks like the existing ones in test_nn just compare against snapshots. I could add something similar, but curious if there's any prepackaged way to add a test that LayerNorm-first (the new option) yields model that trains properly, etc. 3. New code in the `forward`s was written to minimize diff churn rather than maximize beauty :P happy to pretty it up if desired. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29356590 Pulled By: bhosmer fbshipit-source-id: 308669326990b8923aab5fcd96e03b582fb21f24	2021-06-25 15:43:35 -07:00
angelayi	e13a9587b4	Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646 This reverts commit e60f9cfc58fb2fe3e2e7f65fcdbbf350e5b55a75. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29361191 Pulled By: angelayi fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714	2021-06-25 15:37:05 -07:00
Mikhail Zolotukhin	7188d84ccf	[Tools] Update path in clang_format_utils after #60473 (#60782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60782 PR #60473 introduced a new folders nesting level, this change updates clang_format_utils.py to accordingly adjust the way it sets up root path. Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D29403622 Pulled By: ZolotukhinM fbshipit-source-id: 6404271615c2d263834cf538ab0153c4d41cc5c3	2021-06-25 14:30:45 -07:00
Adam Simpkins	394f60b0fc	[caffe2] update make_cifar_db to move the string into DB::Put() (#60692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692 Update make_cifar_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). Test Plan: buck build caffe2/binaries:make_cifar_db Differential Revision: D29374754 fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38	2021-06-25 14:02:24 -07:00
Ilqar Ramazanli	e1bd4963e2	To intorduce Functional API for multi-tensor (#60735 ) Summary: In this PR we change Multi-Tensor Optimizers to Functional API. We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors : https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor Therefore we are adding it in this PR here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735 Reviewed By: vincentqb Differential Revision: D29392253 Pulled By: iramazanli fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46	2021-06-25 13:09:26 -07:00
Richard Barnes	8f16a38067	Add missing kernel checks (#60635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60635 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29355747 fbshipit-source-id: 20bae292703a54b2895a33c11e6f1b8b9a9d8195	2021-06-25 12:54:40 -07:00
lezcano	dfc8247d33	Faster cumsum and cumprod backwards (#60642 ) Summary: Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result. Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`. <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(ndims, prod_dim, dim, num_threads, device): size = [500]ndims size[dim] = prod_dim x = torch.rand(size, device=device, requires_grad=True) # Make sure there are no zeros as the formula for the backward # that we are testing is for when the backward has no zeros with torch.no_grad(): x.add_(1e-3) grad = torch.ones_like(x) timer = Timer( "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])", globals={"x": x, "dim": dim, "grad": grad}, label=f"Cumprod + Backwards {device}", description=f"dim: {dim}", sub_label=f"prod_dim: {prod_dim}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): ndims = 3 dims = range(ndims) prod_dims = [10, 100, 500] for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) for num_threads in threads: yield ndims, prod_dim, dim, num_threads, device compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary> Benchmark PR </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 14 \| 12 prod_dim: 100 \| 260 \| 270 \| 260 prod_dim: 500 \| 1400 \| 1550 \| 1360 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 166 \| 167 prod_dim: 500 \| 902 \| 950 \| 858 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 3 \| 3 prod_dim: 100 \| 110 \| 108 \| 106 prod_dim: 500 \| 576 \| 590 \| 547 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 562 \| 566 \| 1075 prod_dim: 100 \| 5388 \| 5394 \| 6697 prod_dim: 500 \| 28170 \| 27580 \| 30740 Times are in microseconds (us). ``` </details> <details> <summary> Benchmark master </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 13 \| 12 prod_dim: 100 \| 270 \| 270 \| 256 prod_dim: 500 \| 1500 \| 1590 \| 1300 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 170 \| 164 prod_dim: 500 \| 911 \| 940 \| 840 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 4 \| 4 prod_dim: 100 \| 111 \| 109 \| 105 prod_dim: 500 \| 570 \| 590 \| 536 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 616 \| 597 \| 1109 prod_dim: 100 \| 5976 \| 5723 \| 7017 prod_dim: 500 \| 31110 \| 29160 \| 32320 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642 Reviewed By: ngimel Differential Revision: D29366368 Pulled By: albanD fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde	2021-06-25 12:44:12 -07:00
David Riazati	d3bec9f4d2	Use S3 for documentation previews (#60711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60711 We already build the docs on each PR, this adds a step to push the relevant folder of the docs (we build the entire website for pytorch.github.io which clocks in at around 500 MB, but we really only need the "master" docs, not every version. The master docs by themselves are around 50 MB which is more reasonable). It uses the same S3 bucket as the artifacts but places the items at the `pytorch/pytorch/pr-previews/<pr number>` prefix. The bucket has a rule to expire resources in that prefix after 1 month. On the AWS side the bucket has static hosting enabled with CloudFront directing to the docs preview prefix, so you can see the output at `https://d28slxzaq48q8t.cloudfront.net/<pr number>/`, e.g. https://d28slxzaq48q8t.cloudfront.net/60711/. For advertising we could link this on the HUD PR page as well as in the Dr. CI comment. We could add a CNAME on CloudFront to make this be `pr-preview.pytorch.org/<pr number>` or something but having random PRs be able to host content on the pytorch.org domain seems sketchy. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29398818 Pulled By: driazati fbshipit-source-id: 24032854d83815853b3650d8e54f60b684707f76	2021-06-25 12:12:26 -07:00
Edward Yang	aacc722aec	Dispatch to Python via __torch_dispatch__ (#59760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59760 See https://github.com/pytorch/pytorch/issues/59049 There are some moving parts to this PR, I'll structure this explanation so the straightforward parts go first, and then the less straightforward parts. The actual dispatch to Python. The core logic of dispatch to Python lives in `concrete_dispatch_fn` in `torch/csrc/autograd/python_variable.cpp`. It takes the input IValue stack, scans all the arguments for Tensor arguments, and defers most of the heavy lifting to `handle_torch_function_no_python_arg_parser` which actually does all of the logic for calling out to torch dispatch (in particular, this function handles multiple dispatch situations for you). Because we have a different function name than regular `__torch_function__` handling, `handle_torch_function_no_python_arg_parser` is generalized to accept a magic method name to look for when testing if Tensors have custom handling or not. Unlike `__torch_function__`, by default there is no `__torch_dispatch__` on Tensor classes. Maintaining the Python dispatch key. In order to get to the dispatch to Python logic, we must tag Tensors with the `__torch_dispatch__` magic method with the newly added Python dispatch key (separated from PythonFuncTorch to allow for a transitional period while they migrate to this mechanism). We expose a new private property `_is_python_dispatch` that assists in debugging if a Tensor is participating in Python dispatch or not. We apply the Python dispatch key the first time a PyObject for a Tensor is constructed (THPVariable_NewWithVar), testing if `__torch_dispatch__` exists with then newly added `check_has_torch_dispatch`. Shallow copy and detach. For the simple examples tested in this PR, most creations of Tensor route through the dispatcher. The exception to this is `shallow_copy_and_detach`, which bypasses the dispatcher and is used when saving tensors for backwards. When a Tensor is Python dispatch, we override the behavior of `shallow_copy_and_detach` to instead directly call into `__torch_dispatch__` to perform a `detach` operation (in the same way it would be invoked if you called `detach` directly). Because this Python call is triggered directly from c10::TensorImpl, it must be indirected through `PyInterpreter::detach`, which is the general mechanism for dynamic dispatching to the Python interpreter associated with a TensorImpl. torchdeploy compatibility. The dispatch to Python logic cannot be directly registered to the dispatcher as it is compiled in the Python library, which will get loaded multiple times per torchdeploy interpreter. Thus, we must employ a two phase process. First, we register a fallback inside a non-Python library (aten/src/ATen/core/PythonFallbackKernel.cpp). Its job is to determine the appropriate PyInterpreter to handle the Python dispatch by going through all of the arguments and finding the first argument that has a PyObject/PyInterpreter. With this PyInterpreter, it makes another dynamic dispatch via "dispatch" which will go to the correct torchdeploy interpreter to handle dispatching to actual Python. Testing. We provide a simple example of a LoggingTensor for testing, which can be used to generate TorchScript-like traces to observe what operations are being called when a Tensor is invoked. Although a LoggingTensor would be better implemented via an is-a relationship rather than a has-a relationship (as is done in the test), we've done it this way to show that arbitrarily complex compositions of tensors inside a tensor work properly. Known limitations. * We haven't adjusted any operator code, so some patterns may not work (as they lose the Python subclass in an unrecoverable way) * `__torch_function__` must be explicitly disabled with `_disabled_torch_function_impl` otherwise things don't work quite correctly (in particular, what is being disabled is default subclass preservation behavior.) * We don't ever populate kwargs, even when an argument is kwarg-only Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D29017912 D29017912 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ezyang fbshipit-source-id: a67714d9e541d09203a8cfc85345b8967db86238	2021-06-25 11:50:32 -07:00
Aswin John Mathews	a53d7f8f7c	Remove test linalg test skips from MAGMA integration (#58232 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303 Tests in torch/testing/_internal/common_methods_invocations.py (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232 Reviewed By: ngimel Differential Revision: D29394021 Pulled By: malfet fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87	2021-06-25 11:44:49 -07:00
Elton Leander Pinto	8216da1f23	Use python3.6 compatible APIs in clang_tidy.py (#60659 ) Summary: This PR make `tools/clang_tidy.py` use python 3.6 APIs for `asyncio` and `shlex`. I ran into some issues when running this script with the `-j` flag inside of the clang-tidy docker image (which uses python 3.6). Specifically, the functions `asycnio.run` and `shlex.join` are available in python >= 3.8. This change does not affect CI because we do not run the clang-tidy job in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60659 Reviewed By: albanD Differential Revision: D29377851 Pulled By: 1ntEgr8 fbshipit-source-id: 92ab7ee6782b78d40ffccd03f1718ede4204d948	2021-06-25 10:35:03 -07:00
Edgar Andrés Margffoy Tuay	6322f66878	Add python version and cuda-specific folder to store extensions (#60592 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60592 Reviewed By: albanD Differential Revision: D29353368 Pulled By: ezyang fbshipit-source-id: 1fbcd021f1030132c0f950f33ce4a3a2fef351e0	2021-06-25 10:27:04 -07:00
Masaki Kozuki	a404cc9a7b	CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715 ) Summary: Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not. ### Reproducible steps to see the behavioral difference ```ipython In [1]: import torch; torch.__version__ Out[1]: '1.9.0' In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half) In [4]: torch.addcmul(a, b, c, value=2) Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16) In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0] Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16) ``` ### How foreach casts? Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30)` and cast inputs and results here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)` Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715 Reviewed By: albanD Differential Revision: D29385715 Pulled By: ngimel fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603	2021-06-25 10:21:35 -07:00
Rohan Varma	0be65cd52a	[c10d] Fix test_collective_hang flakiness (#60662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60662 Fixes this flaky test. Basically, sometimes a rank can exit the test early before rank 0 calls into allreduce. In this case Gloo will throw connection reset error on all other ranks. ghstack-source-id: 132363151 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29364806 fbshipit-source-id: ce0c292a2166edad57ea0dbb76df12cfd560a10d	2021-06-25 10:15:18 -07:00
Elton Leander Pinto	474bdaf54d	Add --print-include-paths option to tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60739 Test Plan: Run this comand: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --print-include-paths ``` Output (varies from machine to machine): ``` (clang-tidy output) . . . clang -cc1 version 11.0.0 based upon LLVM 11.0.0 default target x86_64-unknown-linux-gnu ignoring nonexistent directory "nccl/include" ignoring nonexistent directory "/include" ignoring duplicate directory ".." ignoring duplicate directory "../aten/src" ignoring duplicate directory "../third_party/onnx" ignoring duplicate directory ".." ignoring duplicate directory ".." ignoring duplicate directory "../torch/lib" ignoring duplicate directory "../torch/../third_party/gloo" as it is a non-system directory that duplicates a system directory ignoring duplicate directory "../third_party/ideep/mkl-dnn/src/../include" as it is a non-system directory that duplicates a system directory #include "..." search starts here: #include <...> search starts here: aten/src ../aten/src . .. ../cmake/../third_party/benchmark/include caffe2/contrib/aten ../third_party/onnx third_party/onnx ../third_party/foxi third_party/foxi ../torch/../aten/src/TH caffe2/aten/src third_party ../torch/../third_party/valgrind-headers ../torch/csrc ../torch/csrc/api/include ../torch/lib ../torch/lib/libshm ../torch/csrc/api third_party/ideep/mkl-dnn/include ../third_party/fmt/include third_party/gloo ../torch/../third_party/gloo ../cmake/../third_party/googletest/googlemock/include ../cmake/../third_party/googletest/googletest/include ../third_party/protobuf/src /data/users/eltonpinto/miniconda3/envs/pytorch/include ../third_party/gemmlowp ../third_party/neon2sse ../third_party/XNNPACK/include ../third_party ../cmake/../third_party/eigen /home/eltonpinto/local/miniconda3/envs/pytorch/include/python3.8 /home/eltonpinto/local/miniconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/include ../cmake/../third_party/pybind11/include /usr/local/cuda-11.3/include ../third_party/ideep/mkl-dnn/src/../include ../third_party/ideep/include /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward /usr/local/include /usr/lib64/clang/11.0.0/include /usr/include . . . (more clang-tidy output) ``` Imported from OSS Reviewed By: ngimel Differential Revision: D29395398 fbshipit-source-id: e92077a9c4e9dee7f9d7e05df180d552e3763540	2021-06-25 10:12:15 -07:00
Elton Leander Pinto	608f12b818	Fix --dry-run option in tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60741 Test Plan: Run this command: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --dry-run ``` Output: ``` clang-tidy -p build -config '{"InheritParentConfig": true, "Checks": " bugprone-, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, -bugprone-reserved-identifier, cppcoreguidelines-, -cppcoreguidelines-avoid-magic-numbers, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-macro-usage, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, -facebook-hte-RelativeInclude, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-, -modernize-concat-nested-namespaces, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-, -performance-noexcept-move-constructor, -performance-unnecessary-value-param, ", "HeaderFilterRegex": "torch/csrc/.*", "AnalyzeTemporaryDtors": false, "CheckOptions": null}' torch/csrc/fx/fx_init.cpp ``` Reviewed By: ngimel Differential Revision: D29394538 Pulled By: 1ntEgr8 fbshipit-source-id: b824bc2aa63631f074e9ad17092e4e063d347395	2021-06-25 09:53:29 -07:00
lezcano	3a838e4ce3	Parametrizations depending on several inputs (#60530 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/58488 There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668 I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3	2021-06-25 09:16:57 -07:00
Kevin Tse	8cba365378	Fix incorrect doc about the dtype for `torch.randint` described in issue #56347 (#60507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60507 Fix incorrect documentation about the dtype for `torch.randint` described in issue #56347 Test Plan: Review documentation to make sure formatting is right Reviewed By: bdhirsh Differential Revision: D29321181 fbshipit-source-id: caae69a9bbb30052da518a3f5d22a7ed3504cdd2	2021-06-25 07:51:36 -07:00
Martin Yuan	d8c3d555e4	[Delegate] Support composite of lowered sub modules of the same backend (#59921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921 Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D29091143 Pulled By: iseeyuan fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc	2021-06-25 07:18:32 -07:00
Ilqar Ramazanli	7c2938bf67	To refactor Sparse Adam algorithm for functional form (#59171 ) Summary: Adds Functional Interface for Sparse Adam Optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171 Reviewed By: vincentqb Differential Revision: D29360582 Pulled By: iramazanli fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0	2021-06-25 06:35:39 -07:00
Xiaomeng Yang	963c983366	Improve numerical stability of LayerNorm (#59987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987 Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: ngimel Differential Revision: D29115235 fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790	2021-06-25 02:22:42 -07:00
Protonu Basu	5b1f5c8f17	When creating a single parition skip the output nodes, but process possible nodes after it. (#60370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60370 When creating a single parition skip the output nodes, but process possible nodes after it. Test Plan: Run all CI tests. Reviewed By: jfix71 Differential Revision: D29265278 fbshipit-source-id: 2242009973a54498d8027cce5a294558a1206fdf	2021-06-24 23:50:30 -07:00
Hao Lu	2b51a8a935	[BackwardCompatibility] Remove aten::to from allow_list (#60147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60147 Remove aten::to from allow_list now that the aten::to schema change has landed (D29121620 (`eda2ddb5b0`)). Test Plan: CI Reviewed By: iseeyuan Differential Revision: D29187314 fbshipit-source-id: abdb5a560287a861f3858732f7b3da342ee4aa55	2021-06-24 22:57:57 -07:00
kshitij12345	3ca28656fa	[special] erfcx cuda support (#60519 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60519 Reviewed By: ngimel Differential Revision: D29353105 Pulled By: mruberry fbshipit-source-id: 2f525a347a22f96411739a16e354c7291e863f95	2021-06-24 21:50:37 -07:00
Garrett Cramer	46d27a53fe	cuda rpc backward sparse tensor fix (#59609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59609 quick fix for https://github.com/pytorch/pytorch/issues/58755 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29335722 Pulled By: gcramer23 fbshipit-source-id: 0de7e0399b30f0934320f1e9abb1b92a45bcf929	2021-06-24 21:40:43 -07:00
Mike Ruberry	561132f902	Revert D29330585: [pytorch][PR] add BFloat16 support for arange on CPU Test Plan: revert-hammer Differential Revision: D29330585 (`375d201086`) Original commit changeset: b8a04cee0c3f fbshipit-source-id: dc138f9613becd083848e82d15c138d3883493c8	2021-06-24 20:57:43 -07:00
David Reiss	d63c236fb3	Introduce quantized convolution serialization format 3 (#60241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60241 We're going to make a forward-incompatible change to this serialization format soon, so I'm taking the opportunity to do a little cleanup. - Use int for version. This was apparently not possible when V2 was introduced, but it works fine now as long as we use int64_t. (Note that the 64-bits are only used in memory. The serializer will use 1 byte for small non-negative ints.) - Remove the "packed params" tensor and replace it with a list of ints. - Replace the "transpose" field with "flags" to allow more binary flags to be packed in. - Unify required and optional tensors. I just made them all optional and added an explicit assertion for the one we require. A bit of a hack: I added an always-absent tensor to the front of the tensor list. Without this, when passing unpacked params from Python to the ONNX JIT pass, they type would be inferred to `List[Tensor]` if all tensors were present, making it impossible to cast to `std::vector<c10::optional<at:Tensor>>` without jumping through hoops. The plan is to ship this, along with another diff that adds a flag to indicate numerical requirements, wait a few weeks for an FC grace period, then flip the serialization version. Test Plan: CI. BC tests. Reviewed By: vkuzo, dhruvbird Differential Revision: D29349782 Pulled By: dreiss fbshipit-source-id: cfef5d006e940ac1b8e09dc5b4c5ecf906de8716	2021-06-24 20:52:43 -07:00
Peter Bell	42c8439b6e	TH: Clean up dead code (#60655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60655 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371717 Pulled By: ngimel fbshipit-source-id: faa71b1d4a15450c78e12aa917daec853057bce9	2021-06-24 19:42:16 -07:00
Peter Bell	4a7d281119	Migrate THAllocator to ATen (#60325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371715 Pulled By: ngimel fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff	2021-06-24 19:42:14 -07:00
Peter Bell	d586248544	Migrate THStorage_resizeBytes to ATen (CPU) (#60324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60324 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371716 Pulled By: ngimel fbshipit-source-id: 056aee0ec87722090c133777b6948c28b03b37e4	2021-06-24 19:41:02 -07:00
Natalia Gimelshein	ddec2e0ef4	tentative fix for adaptiveavgpool gradient computation (#60630 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60524 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60630 Reviewed By: jbschlosser Differential Revision: D29374257 Pulled By: ngimel fbshipit-source-id: be05f0ceb53e6f0f0a59a83b710dafde469d4e8a	2021-06-24 19:02:32 -07:00
Nikita Shulga	40a7c317bc	Run BLAS F2C checks on host architecture (#60703 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60703 Reviewed By: driazati Differential Revision: D29379727 Pulled By: malfet fbshipit-source-id: dadbb1d39373887f07d59d0a05e093a5d070b016	2021-06-24 18:44:41 -07:00
Brian Hirsh	7bc86458e1	Revert "Revert D28833086: beef up at::_ops API" (#60214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214 Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871) This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA` This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675 This reverts commit 6d0fb85a623f5ef3f3f1a2afc3660cb71fa70511. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29213932 Pulled By: bdhirsh fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248	2021-06-24 18:08:54 -07:00
Nikita Shulga	9c4eec2a2d	Adjust path to distributed cpp tests (#60705 ) Summary: After https://github.com/pytorch/pytorch/issues/60543 they are installed in the same folder as the rest of the tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/60705 Reviewed By: driazati Differential Revision: D29380670 Pulled By: malfet fbshipit-source-id: a432d26c731e9220e00d8c800b1429b37d51655b	2021-06-24 17:42:36 -07:00
Alexander Grund	8395fdde46	Increase tolerance for some distributed tests to 5e-5 (#60462 ) Summary: On A100 GPUs 10 tests fail due to slightly higher deviations. This fixes those. Note that rtol is still the default and atol was increased by a factor of 5 (from 1e-5) The failing tests were: - test_accumulate_gradients_module - test_accumulate_gradients_module_with_grad_is_view - test_ddp_checkpointing_once - test_ddp_checkpointing_twice - test_ddp_checkpointing_unused_params - test_ddp_checkpointing_weight_sharing - test_nccl_backend_1gpu_module_device_ids_integer_list - test_nccl_backend_1gpu_module_device_ids_torch_device_list - test_nccl_backend_single_device_module_device_ids_None - test_nccl_backend_single_device_module_empty_device_id Pull Request resolved: https://github.com/pytorch/pytorch/pull/60462 Reviewed By: albanD Differential Revision: D29366145 Pulled By: zhaojuanmao fbshipit-source-id: c3e34c007363dfebf75ccb82004a67e4d2e6f3cd	2021-06-24 17:38:54 -07:00
Michael Carilli	2fa6c7627e	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: albanD Differential Revision: D29370344 Pulled By: ngimel fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe	2021-06-24 17:34:02 -07:00
albanD	d90aefe380	Improve error message for non-differentiable inputs (#60610 ) Summary: Improve the error message when inputs should not requires_grad=True. For example, we now get ``` RuntimeError: The function 'binary_cross_entropy' is not differentiable with respect to argument 'weight'. This input cannot have requires_grad True. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60610 Reviewed By: anjali411 Differential Revision: D29361424 Pulled By: albanD fbshipit-source-id: 38163ce11ae1b8df326424e95ca20e55fea2a99a	2021-06-24 17:29:16 -07:00
Garrett Cramer	4ed2d5d9bb	ps sparse rpc (#58003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f	2021-06-24 17:21:49 -07:00
Adam Simpkins	fadaa52f64	[caffe2] add an EstimateAllBlobSizes operator (#59775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775 This operator is similar to `GetAllBlobNames` but also returns the estimated size required to serialize each node. One goal of this operator is to allow checkpoint saving logic to estimate the amount of space/bandwidth required to save a checkpoint when first starting training, without actually serializing any blobs yet. Currently the checkpointing logic uses `GetAllBlobNames` to determine the blobs to checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also get an estimate for how much space will be required for the checkpoint. ghstack-source-id: 132275153 Test Plan: Included a new unit test. Reviewed By: mraway Differential Revision: D29020227 fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043	2021-06-24 16:55:22 -07:00
Lily Johnson	fe4ded01f7	[package] typing.io/re edge case hack (#60666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60666 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29367847 Pulled By: Lilyjjo fbshipit-source-id: 2c38140fbb3eab61ae3de60ab475243f0338c547	2021-06-24 14:53:46 -07:00
jiayisun	375d201086	add BFloat16 support for arange on CPU (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: VitalyFedyunin Differential Revision: D29330585 Pulled By: ezyang fbshipit-source-id: b8a04cee0c3f2ff5544e2b821324ce8fc4e9d0f2	2021-06-24 14:38:47 -07:00
Vasiliy Kuznetsov	7fc4e67771	ns for fx: fix shadow logger error for resnet18 (#60559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559 Adds `resnet18` to integration test, and fixes the error to make creating the shadow model work. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29336236 fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea	2021-06-24 13:42:18 -07:00
Vasiliy Kuznetsov	4ddb2b43b7	ns for fx: expose function to add comparisons between logged values (#60311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311 Adds a user facing utility function to FX Numeric Suite Core APIs for comparing the values extracted by the loggers to each other. This is needed for any kind of analysis, so would be great to provide an example implementation. Example: ``` // code m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval() qconfig_dict = {'': torch.quantization.default_qconfig} mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict) mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp)) results = extract_weights('fp32', mp, 'int8', mq) extend_logger_results_with_comparison( results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32') print(results) // results { '_1': {'weight': { 'fp32': [ {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0} ], 'int8': [ {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926, zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]} ] }}, '_0': {'weight': { 'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}], 'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341, zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}] }} } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29244715 fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea	2021-06-24 13:42:16 -07:00
Vasiliy Kuznetsov	31fe1c1323	ns for fx: rekey results by model node names (#60305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305 Adjusts the NS for FX weight and activation extraction APIs to require a model name, and rekeys the results of these APIs to use the node names of the specified model as layer keys. For example, before ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger) // results {'base_op_1_0': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` and after ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger, 'model_b_name') // results // note: instead of `base_op_1_0`, the layer is named `linear1` {'linear1': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` Note: we cannot use these names while collecting data because node names are not guaranteed to be consistent across graphs. This is why we only rekey as the very last step. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29243045 fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0	2021-06-24 13:41:01 -07:00
Alexander Grund	0ba4044b9d	Increase some tolerances for tf32 for Conv3d tests (#60451 ) Summary: Allow those tests to pass on A100 GPUs which support tf32 Basically follow-up to https://github.com/pytorch/pytorch/pull/52871 which also increased some precisions to 0.05 For reference these are the failures I see (only ones in testnn with 1.9.0): ``` FAIL: test_Conv3d_pad_same_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 161 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso ns). The greatest difference was 0.032408137116391345 (-33.45570601919647 vs. -33.42329788208008), which occurred at index (2, 0, 0, 1, 0). ====================================================================== FAIL: test_Conv3d_pad_same_dilated_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 111 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso ns). The greatest difference was 0.024654212557543076 (35.104286017977465 vs. 35.07963180541992), which occurred at index (3, 0, 0, 0, 2). ====================================================================== FAIL: test_Conv3d_pad_valid_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 41 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.010903167642320355 (8.074376869119371 vs. 8.06347370147705), which occurred at index (0, 0, 1, 0, 0). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60451 Reviewed By: albanD Differential Revision: D29353255 Pulled By: ngimel fbshipit-source-id: 155a02242be5a11dcbd9dd40ab63f15c6757ae1b	2021-06-24 13:36:27 -07:00
albanD	a3ebc40bab	Update intro doc for derivatives.yaml (#60614 ) Summary: Clarify some phrasing and document the findings on the different non differentiable states. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60614 Reviewed By: anjali411 Differential Revision: D29362740 Pulled By: albanD fbshipit-source-id: 5bc2e8b8dde57ba5a9247d7c28b83c793703e35f	2021-06-24 13:20:40 -07:00
Richard Barnes	48509b1a9b	Add exclusion list to _check_kernel_launches.py (#60562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60562 Test Plan: ``` buck test //caffe2/test:kernel_launch_checks ``` Reviewed By: ngimel Differential Revision: D29336561 fbshipit-source-id: 0cc101143d24e887e852bd6a9ab34ac43155eb63	2021-06-24 13:18:07 -07:00
Luca Wehrstedt	a016150163	Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543 Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place. ghstack-source-id: 132306292 Test Plan: It builds Reviewed By: cbalioglu Differential Revision: D29062002 fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6	2021-06-24 12:38:51 -07:00
Edward Yang	b8d7db3b31	Turn default kernels into Meyer singletons (#60568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60568 https://github.com/pytorch/pytorch/pull/58661 induced a static initialization order fiasco as flagged by ASAN strict_init_order=true. On further inspection, it became clear that it was not necessary for these to actually be globals initialized at module load time; so I converted them into Meyer singletons which ensures they get loaded immediately when another compilation unit requests them. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29338019 Pulled By: ezyang fbshipit-source-id: 282846118df6867277404a1830d0ce39fccaa769	2021-06-24 12:30:26 -07:00
Edward Yang	4c00df12ec	Include full Python version in collect_env.py output (#59632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632 Before: ``` Python version: 3.7 (64-bit runtime) ``` After: ``` Python version: 3.7.7 (default, Mar 23 2020, 17:31:31) [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28961500 Pulled By: ezyang fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f	2021-06-24 12:11:01 -07:00
Amy He	d52ef2497a	Python basic module execution unit test on delegation of backend_with_compiler_demo (#60468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60468 Added a unit test for the execution of a basic module with a compiler ghstack-source-id: 132307488 Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v returns a successful test Imported from OSS Reviewed By: iseeyuan Differential Revision: D29306225 fbshipit-source-id: bf1ff075ebc63acbbe46d6ea030086405e29d7d3	2021-06-24 11:43:45 -07:00
nikithamalgi	b7298f499d	Annotate NoneType as Optional[type] (#60383 ) Summary: ------------ Infer NoneType as Optional[torch.Tensor] for monkeytype type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/60383 Test Plan: ------ python test/test_jit.py -k TestPDT.test_nonetype_as_optional_of_type Reviewed By: gmagogsfm Differential Revision: D29341513 Pulled By: nikithamalgifb fbshipit-source-id: 9a96670cb5cf2560cd4e19962faef5fecea8b24a	2021-06-24 11:00:26 -07:00
mingfeima	5a077bb10b	Optimize some redunction operators on CPU BFloat16 (#55202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55202 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836790 Pulled By: VitalyFedyunin fbshipit-source-id: f3a29633d85eb5a614652e568140e9b19509f959	2021-06-24 10:50:24 -07:00
Luca Wehrstedt	4aff267072	Fix Windows error in distributed (#60167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60167 We were getting errors such as this on Windows in our c10d ProcessGroup test suite: ``` test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... Exception in thread Thread-1: Traceback (most recent call last): File "C:\Jenkins\Miniconda3\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Jenkins\Miniconda3\lib\threading.py", line 870, in run self._target(self._args, *self._kwargs) File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 471, in _event_listener if pipe.poll(None): File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 257, in poll return self._poll(timeout) File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 330, in _poll return bool(wait([self], timeout)) File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 883, in wait ov.cancel() OSError: [WinError 6] The handle is invalid Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads Python runtime state: finalizing (tstate=000001EFDF228CE0) Thread 0x00001f68 (most recent call first): File "C:\Jenkins\Miniconda3\lib\threading.py", line 1202 in invoke_excepthook File "C:\Jenkins\Miniconda3\lib\threading.py", line 934 in _bootstrap_inner File "C:\Jenkins\Miniconda3\lib\threading.py", line 890 in _bootstrap Current thread 0x00000f94 (most recent call first): <no Python frame> FAIL (5.009s) ``` And the process would then exit with error code 3221226505. See: https://app.circleci.com/pipelines/github/pytorch/pytorch/337351/workflows/ad919a3e-fe9a-4566-8ad6-8b0a252f730c/jobs/14170191/steps By looking at [the code of `_event_listener` in `common_distributed.py`](`eb36f67dcc/torch/testing/_internal/common_distributed.py (L467-L489)`) I think that the first exception (the one about the handle being invalid) is "expected" as it results from another thread purposely closing the pipe while that thread is polling it. The relevant part of the problem seems to be the "could not acquire lock" one. I think this stems from the event listener thread being launched as a daemon thread, which means the interpreter will not wait for that thread to complete before shutting down. When the interpreter shuts down it instantly aborts all other threads. If the event listener thread was aborter _while_ it was logging to stdout then that thread was holding the lock but never got to release it. This is probably what the error is complaining about. This seems to be intended/expected behavior for CPython: https://bugs.python.org/issue42717. The solution thus is simple: don't make that thread a daemon thread and explicitly wait for it to terminate before shutting down. ghstack-source-id: 132293710 Test Plan: Will see... Reviewed By: pritamdamania87 Differential Revision: D29193014 fbshipit-source-id: 4aabe1fc74bf9c54ca605e7a702ac99655489780	2021-06-24 10:35:38 -07:00
Eli Uriegas	f2f2f5bf20	.github: Zip test reports before uploading (#60475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60475 Uploading many artifacts can cause issues with GHA backend leading to errors on our side. To be safe let's zip our artifacts into one archive so that we avoid uploading too many files at once. See: https://github.com/actions/upload-artifact#too-many-uploads-resulting-in-429-responses Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29307205 Pulled By: seemethere fbshipit-source-id: da8c9957f88bdcc758969157ee696205db5d4dff	2021-06-24 10:30:51 -07:00
Rong Rong (AI Infra)	7e619b9588	First step to rearrange files in tools folder (#60473 ) Summary: Changes including: - introduced `linter/`, `testing/`, `stats/` folders in `tools/` - move appropriate scripts into these folders - change grepped references in the pytorch/pytorch repo Next step - introduce `build/` folder for build scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473 Test Plan: - CI (this is important b/c pytorch/test-infra also rely on some script reference. - tools/tests/ Reviewed By: albanD Differential Revision: D29352716 Pulled By: walterddr fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd	2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)	40d2fe1053	correct filename issue for test_cpp_extensions_aot (#60604 ) Summary: Using file copy to make actual ninja vs. no_ninja suffixed python test files. This is to trick xmlrunner to report test cases in the correct folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604 Test Plan: - CI reports correctly into the corresponding folders - If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot` CI result shown it is working properly: https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673 Reviewed By: albanD Differential Revision: D29349562 Pulled By: walterddr fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944	2021-06-24 09:20:19 -07:00
zhouzhuojie	9cab894367	Fix build_only for libtorch (#60615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60605 We have the `build_only` defined, but the config.yml doesn't have the parameter, this PR fixed that. As a result, the docker image push will be skipped ``` // in config.yml if [ -z "${BUILD_ONLY}" ]; then ``` ``` ("11.1", [ ("3.8", [ ("shard_test", [XImportant(True)]), ("libtorch", [ (True, [ ('build_only', [X(True)]), ]), ]), ]), ]), ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60615 Reviewed By: albanD Differential Revision: D29351567 Pulled By: zhouzhuojie fbshipit-source-id: dab78bb91f62e8bed47739377987167fea1602cb	2021-06-24 09:11:54 -07:00
sawradip	eddc5f40f9	Added GLU and FeatureAlphaDropout to nn docs (#60590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590 Reviewed By: albanD Differential Revision: D29352372 Pulled By: jbschlosser fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0	2021-06-24 08:00:18 -07:00
Edward Yang	204da12592	Reduce number of CEX when passing Tensors to Python (#60546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60546 Before, we assume conservatively that any Tensor passed to THPVariable_Wrap could be aliased in another thread and therefore race. However, THPVariable_Wrap takes in Variable by value; and so if use_count() <= 1, it is impossible for another thread to have a reference to it. So we can conclude that it is definitely uninitialized if the quick test fails! Thanks bdhirsh for pointing out the optimization opportunity here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29331718 Pulled By: ezyang fbshipit-source-id: e100796fbc55a0af2c6565c6fbc9ddc8ae7ceb42	2021-06-24 07:40:39 -07:00
Luca Wehrstedt	bdb964f89f	Support RRefs that contain threading.Locks (#57943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57943 This is a common scenario (our own tutorials propose it), hence we should ensure it works. A more generic solution is desirable, but this should fix the immediate concern. ghstack-source-id: 132289683 Test Plan: Added a test Reviewed By: mrshenli Differential Revision: D28316076 fbshipit-source-id: 64e9766189f40474298876227ea247ce5b699d97	2021-06-24 06:36:09 -07:00
lezcano	4e347f1242	[docs] Fix backticks in docs (#60474 ) Summary: There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others). I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML. This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474 Reviewed By: mrshenli Differential Revision: D29309633 Pulled By: albanD fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046	2021-06-24 06:27:41 -07:00
Luca Wehrstedt	bb9e1150ea	Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream Test Plan: revert-hammer Differential Revision: D29342234 (`675cea1adb`) Original commit changeset: 98e6be7fdd85 fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6	2021-06-24 04:49:28 -07:00
Luca Wehrstedt	2b72068a68	Make Future store Storages instead of references to DataPtrs (#60470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60470 A Future needs to know what DataPtrs are used by its value, but it isn't always able to extract them (and even when it is, that's expensive) so they're cached. DataPtrs are kinda like unique_ptrs (movable only, cannot be copied) hence the Future can only hold _references_ to them. The Future's value, however, is unfortunately mutable (we'd wish that weren't the case, but we don't think we can prevent that), which means the tensor/storage that owned that DataPtr might be deleted and thus the DataPtr could be freed. This means our cached reference becomes stale! Which leads to all kinds of disaster, like reading garbage data or segfaulting. Luckily all the DataPtrs we were dealing with were held inside Storages, which have a shared_ptr semantics, thus allowing us to hold a strong pointer to them which ensures they're kept alive. ghstack-source-id: 132177396 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29303570 fbshipit-source-id: d814754806fa58b24e45269e97d768485ef972ba	2021-06-24 03:56:04 -07:00
Luca Wehrstedt	06e6d63187	Use a no-warning registry for TensorPipe backends (#60457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60457 The "without warning" variants of the registry were introduced in https://github.com/pytorch/pytorch/pull/31126 to be used in Gloo for the exact same reason: we use a registry precisely so that backends can be overridden, no need to scare users with a warning. ghstack-source-id: 132051268 Test Plan: Rebuilt and re-run Reviewed By: mrshenli Differential Revision: D29293840 fbshipit-source-id: 3450e547056b2c534166972e8266dab5479d5e43	2021-06-24 03:27:04 -07:00
Raghavan Raman	d3a8505ee1	[jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881 This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29277876 Pulled By: navahgar fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41	2021-06-24 01:27:41 -07:00
Raghavan Raman	c35a3dd6f2	[jit] Added a new operator for concat that takes in variadic parameters (#59880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59880 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29277877 Pulled By: navahgar fbshipit-source-id: 6db24e7432f683a1d1466f9778201e0aa5d3b1ad	2021-06-24 01:26:22 -07:00
kshitij12345	dfd2edc025	[special] add zeta (#59623 ) Summary: Reference https://github.com/pytorch/pytorch/issues/50345 `zeta` was already present in the codebase to support computation of `polygamma`. However, `zeta` only had `double(double, double)` signature for CPU before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part). With this PR, float computations will take place in float and double in double. Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h` Note: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial. Verify: * [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623 Reviewed By: ngimel Differential Revision: D29348269 Pulled By: mruberry fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb	2021-06-24 00:00:12 -07:00
Akifumi Imanishi	26cdec6ce4	Support `torch.bitwise_{left/right}_shift` and `__rlshift__`, `__rrshift__` (#59544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58121 This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard. (cc: mruberry, rgommers, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544 Reviewed By: ngimel Differential Revision: D29348869 Pulled By: mruberry fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca	2021-06-23 23:57:16 -07:00
Pritam Damania	b82453cbd4	Run dist_autograd backward RPCs on appropriate CUDA streams. (#60606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60606 TensorPipe receives tensors over the wire on custom streams and these streams are passed to some RPC callbacks but not to `BACKWARD_AUTOGRAD_REQ`. As a result, `BACKWARD_AUTOGRAD_REQ` ran on the default stream while still using tensors from the custom stream. This resulted in downstream autograd operations running on the incorrect stream. To fix this, I've passed the streams to `BACKWARD_AUTOGRAD_REQ` as well and added an appropriate guard. #Closes: https://github.com/pytorch/pytorch/issues/59793 ghstack-source-id: 132252069 Test Plan: Test https://github.com/pytorch/pytorch/issues/59793 Reviewed By: mrshenli Differential Revision: D29347244 fbshipit-source-id: 8ff8b150763c970ab15c2cac8dccf56e66e9ef5d	2021-06-23 23:52:22 -07:00
Michael Carilli	675cea1adb	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: VitalyFedyunin, albanD Differential Revision: D29342234 Pulled By: ngimel fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63	2021-06-23 23:35:24 -07:00
Adam Simpkins	00896cb9ed	[caffe2] update db::Transaction::Put() to accept the value by rvalue reference (#60208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60208 Update the DB APIs so that `db::Transaction::Put()` accepts the value by rvalue reference. This allows DB implementations to write data asynchronously without being forced to make an additional copy of the data in memory. `Put()` implementations can now use the string move constructor or assignment operator to get the string data and continue performing the write asynchronously after returning from `Put()`. Note that I chose to entirely replace the existing `Put()`, removing the ability for callers to call `Put()` with a `const std::string&` argument for the value, rather than simply adding another overloaded version of `Put()`. This was done because in practice there were no call sites using `Put()` that cannot move in their data. Eliminating the `const std::string&` API entirely simplifies the DB implementations: DBs that wish do support move semantics do not have to implement both the move and the copy versions of `Put()`. Test Plan: Searched through fbcode to try and make sure I found all `db::Transaction` subclasses, and will check sandcastle results to help confirm. Ran the modelstore checkpointing unit tests. Differential Revision: D29204425 fbshipit-source-id: 28be6646e92e5df71954d4bb3dc0c8add30ed041	2021-06-23 22:12:53 -07:00
Adam Simpkins	b09c0b6550	[caffe2] update the BlobSerializer acceptor to allow moving in the data (#60207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60207 Update the `BlobSerializerBase` API so that the serizialized blob data is passed as a `std::string&&` rather than `const std::string&`. This allows the acceptor to take ownership of the string data. This allows the acceptor to do things like queue it for storing asynchronously, rather than having to make a copy of the data if they need it to remain valid after returning. All existing `BlobSerializerBase` implementations already pass in a valid rvalue reference to the data, so this change did not require updating any of the existing serializer implementations. ghstack-source-id: 132216750 Test Plan: Examined all ~46 `BlobSerializerBase` subclasses in fbsource to confirm they already pass in an rvalue reference for this argument. Also searched for `BlobSerializerBase` on google and did not find any external references to this class in other open source projects that might be affected. Differential Revision: D29204426 fbshipit-source-id: b1d567e52a5c17a01d651c70bbfa2fddbaea6cd9	2021-06-23 22:11:42 -07:00
Philip Meier	6ea22672c4	add support for sparse tensors in `torch.testing.assert_close` (#58844 ) Summary: This adds support for sparse tensors the same way `torch.testing._internal.common_utils.TestCase.assertEqual` does: `5c7dace309/torch/testing/_internal/common_utils.py (L1287-L1313)` - Tensors are coalesced before comparison. - Indices and values are compared individually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58844 Reviewed By: zou3519 Differential Revision: D29160250 Pulled By: mruberry fbshipit-source-id: b0955656c2c7ff3db37a1367427ca54ca14f2e87	2021-06-23 21:59:01 -07:00
Yi Wang	80f40b172f	[Model Averaging] Periodic model averager (#60320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60320 This averager can be used for post-local SGD. ghstack-source-id: 131908011 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29249850 fbshipit-source-id: 09675d6bb1edfb8ffbeb94510d91962532d8ca3e	2021-06-23 20:23:04 -07:00
Thomas J. Fan	4e51503b1f	DOC Improves input and target docstring for loss functions (#60553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60553 Reviewed By: VitalyFedyunin Differential Revision: D29343797 Pulled By: jbschlosser fbshipit-source-id: cafc29d60a204a21deff56dd4900157d2adbd91e	2021-06-23 20:20:29 -07:00
Thomas J. Fan	6d1b4642f0	DOC Describes parameters/buffers registered as None in load_state_dict (#60549 ) Summary: Related to https://github.com/pytorch/pytorch/issues/8104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60549 Reviewed By: VitalyFedyunin Differential Revision: D29343732 Pulled By: jbschlosser fbshipit-source-id: ef5ba3094c8eaf2f9c8efeba6a9d9ab52ebf8b2c	2021-06-23 20:15:22 -07:00
Hao Lu	1e31d26b1d	[Static Runtime] Fix bugs in static_runtime::to_copy (#60503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503 Fixed a few issues in the static_runtime::to_copy impl: - fixed a bug with memory_format - copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit. - fix the schema in the `ReplaceWithCopy` pass - add registration of `static_runtime::to_copy.other` Add more unit tests: - test dynamic shapes - test strided input tensor to `aten::to` - test alias case (same input/output) - test `to.other` Reviewed By: ajyu Differential Revision: D26838933 fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522	2021-06-23 19:57:17 -07:00
Hao Lu	d200e9de26	[Static Runtime] Test for dynamic shapes in SR unit tests (#60579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60579 - Modify testStaticRuntime to take two sets of inputs so if the second set of inputs have bigger shapes, it would trigger memory allocations in resize_ calls. - Modify test scripts so that the output of the test op is managed by the memory planner, as explained in comments. Reviewed By: ajyu Differential Revision: D29221452 fbshipit-source-id: 09f0f7eb384dc8ca67594f1fa76e1e31392ee6ca	2021-06-23 19:56:05 -07:00
Thomas J. Fan	99b641169b	Migrates nll_loss_forward from TH to Aten (CUDA) (#60097 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24610 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 The performance does not change between this PR and master with the following benchmark script: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: fwd_t = 0 bwd_t = 0 data = torch.randn(N, C, device=device) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) for i in range(n_runs): t1 = _time() result = loss(input, target) t2 = _time() fwd_t = fwd_t + (t2 - t1) fwd_avg = fwd_t / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"forward time is {fwd_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.81 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.80 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60097 Reviewed By: mrshenli Differential Revision: D29303099 Pulled By: ngimel fbshipit-source-id: fc0d636543a79ea81158d286dcfb84043bec079a	2021-06-23 19:47:01 -07:00
Hong Xu	ef84bcfee6	Convert floating-point constants to T in Bessel functions (#59416 ) Summary: If T is float, many of the computations are more expensive than expected. Compilers may be reluctant to optimize because they often lead to different outcome. Converting many constants to T before using them to clear any doubt. Benchmark: (Debian 11, no turbo, Release build, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, gcc 10.2.1) ```python import timeit for dtype in ('torch.float',): for func in ('i0', 'i0e', 'i1', 'i1e'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.special.{func}(torch.arange(n, dtype=torch.float32)), n = {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.special.{func}(a)', setup=f'import torch; a = torch.arange({n}, dtype=torch.float32)', number=t)) ``` Before: ``` torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.539132010017056 torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.9613071230123751 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 4.32450835997588 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 1.5751779029960744 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.0810036820184905 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.5314770240220241 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.41711462699458934 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.1759720179834403 ``` After: ``` torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.337154256994836 torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.8640981369826477 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 4.308618158014724 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 1.5217605629877653 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.9398589830088895 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.4667845010117162 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.3658539849857334 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.15680673700990155 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59416 Reviewed By: anjali411 Differential Revision: D29249897 Pulled By: mruberry fbshipit-source-id: c170e78f2ab47176ea95b8442c6279d7ec1d75c2	2021-06-23 19:43:27 -07:00
Kushashwa Ravi Shrimali	08020220f3	[Testing] Adding reference tests to `OpInfo` class (#59369 ) Summary: This PR will ideally add `ref` argument to `OpInfo` base class. The idea is to add reference checks for all the ops _eligible_. For more discussion, please check https://github.com/pytorch/pytorch/issues/58294 * [x] Migrate (but not removing yet) and modify helper functions from `UnaryUfuncOpInfo` class to `OpInfo` base class. * [x] Test the reference checks for multiple ops. (also decide a list of different and eligible ops for this) * [x] Handle possible edge cases (for example: `uint64` isn't implemented in PyTorch but is there in NumPy, and this needs to be handled -- more on this later) -- _Update_: We decided that these reference tests should only test for values and not types. * [x] Create a sample PR for a single (of all different categories?) on adding reference functions to the eligible ops. -- _Update_: This is being done in this PR only. * [x] ~Remove reference tests from `test_unary_ufuncs.py` and test to make sure that nothing breaks.~ (Update: We won't be touching Unary Ufunc reference tests in this PR) * [x] Add comments, remove unnecessary prints/comments (added for debugging). Note: To keep the PR description short, examples of edge cases encountered have been mentioned in the comments below. cc: mruberry pmeier kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59369 Reviewed By: ngimel Differential Revision: D29347252 Pulled By: mruberry fbshipit-source-id: 69719deddb1d23c53db45287a7e66c1bfe7e65bb	2021-06-23 19:26:08 -07:00
Nikolay Korovaiko	236d3afd82	manual revert of 57575 (#60572 ) Summary: manually reverting 57575 while keeping 57574 since it's fixing a bug: https://github.com/pytorch/pytorch/issues/55609 Sandcastle couldn't do it automatically Pull Request resolved: https://github.com/pytorch/pytorch/pull/60572 Reviewed By: driazati Differential Revision: D29342473 Pulled By: Krovatkin fbshipit-source-id: 66ad7d316984a13d203158ceba9706a5f451f9b2	2021-06-23 19:21:48 -07:00
Masaki Kozuki	9e773ea7d5	Use `accscalar_t` for CUDA add/sub with Tensor and Scalar (#60454 ) Summary: Follow up of https://github.com/pytorch/pytorch/issues/60227, related to https://github.com/pytorch/pytorch/issues/59907 & https://github.com/pytorch/pytorch/issues/58833 With this pull request, `torch.add` & `torch.sub` use `acc_type` for `Scalar` if either of two arguments is `Scalar`. This mimics the behavior of [`torch.mul`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu#L18), `torch._foreach_(add\|sub).Scalar` and `torch._foreach_(add\|sub).ScalarList`. --- reference - torch.mul CUDA kernel: `b0c9762e2d/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu (L17-L25)` - `torch._foreach_(add\|sub).Scalar`: cast scalar `b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu (L27)` - `torch._foreach_(add\|sub).ScalarList`: `BinaryOpScalarListFunctor` `b0c9762e2d/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L180-L182)` and multi_tensor_apply handles `scalar_t` and computes `opmath_t` (almost equivalent `accscalar_t`) `b0c9762e2d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L60-L68)`. BinaryOpScalarListFunctor is used `b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu (L24)` cc ngimel ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/60454 Reviewed By: VitalyFedyunin Differential Revision: D29345035 Pulled By: ngimel fbshipit-source-id: 5dbafbdfe029a9544ec2e58f17d547928e017a04	2021-06-23 18:59:22 -07:00
Serhat Yilmaz	af66824c1f	[torch][segment_reduce] Add support for sum and min reductions (#60379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60379 This concludes the support for all reductions types initially planned (min, max, mean, sum). Next Steps: - Cleanups - update default values when length is 0 and initial not given - templatize the code to avoid branching with every item.( and other known improvements) - more unit tests, verification - benchmarking Test Plan: updated unit tests. Reviewed By: ngimel Differential Revision: D29268218 fbshipit-source-id: c77d91671e01dcf96c18c758fa3ea522b2e13db9	2021-06-23 18:51:44 -07:00
Ilqar Ramazanli	63219f1f9f	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9	2021-06-23 18:27:57 -07:00
Kiuk Chung	5a2f41a2db	[torch/distributed.elastic] Fix utils.distributed_test.test_create_store_timeout_on_server to be dual-stack ip compatible (#60558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60558 Fixes 1/2 flaky tests as described in: https://github.com/pytorch/pytorch/issues/60260 `test_create_store_timeout_on_server` tests whether trying to create a `c10d::TCPStore` server on an already taken port actually fails with an `IOError`. Prior to this change the `utils.get_socket_with_port()` util method was used to synthetically reserve a port, then try creating the `TCPStore` on that port to validate the `IOError`. The issue with this is that on a dual stack ip setup, `get_socket_with_port()` (since it uses `socket.AF_UNSPEC`) reserves an ipv6 port, while `TCPStore` will try binding to an ipv4 port, so an `IOError` is not observed. Changing the logic of the test to create two `TCPStore` servers. The first chooses a free port (by passing `server_port=0`) while the second tries to create a `TCPStore` server on the port that the first store is already running on. This would induce an `IOError` on the second store's constructor. NOTE: this change does not solve another broader issue with `TCPStore` where the server and workers can listen and connect on ipv4 vs ipv6 when they are running on dual-stak ip hosts without ipv4 DNS entry and/or a `/etc/gai.conf` specifying the preferred bind ordering. See: https://github.com/pytorch/pytorch/pull/49124 Test Plan: ``` buck test //caffe2/test/distributed/elastic/utils:distributed_test ``` Reviewed By: cbalioglu Differential Revision: D29334947 fbshipit-source-id: 76b998c59082cb04c0e86b7a1f3b509367fa0136	2021-06-23 17:12:18 -07:00
Bert Maher	1a0058f593	[nnc] Merge inconsistent profiling information (#60510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60510 We encountered a situation where loop unrolling caused us to duplicate profiled tensor types in a manner that wasn't logically consistent (see the attached test case). When applying this profiling information, we need to merge the profiled types so that we use a conservative (unspecialized) type. ghstack-source-id: 132160002 Test Plan: new unit test, plus local predictor using P424983338 Reviewed By: Krovatkin Differential Revision: D29322487 fbshipit-source-id: 4c18ee69c71bb0622c2e6f6aa361ab5613cbaca4	2021-06-23 17:05:32 -07:00
Tao Xu	b5b42d4ce2	[iOS GPU] Add tests for RoIAlign (#60595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60595 ghstack-source-id: 132245331 Test Plan: CI Reviewed By: husthyc Differential Revision: D29345400 fbshipit-source-id: 7406edee232a0ab7b40a4820e3ff9ac07871cdd4	2021-06-23 16:26:53 -07:00
Supriya Rao	1120a1b92e	[quant][fx][fix] QAT with object_type in qconfig (#60555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555 When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare. However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module type. In this PR we update the qconfig_dict to include the modules swapped for QATT Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29337036 fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd	2021-06-23 15:55:25 -07:00
Hui Guo	d867340c7b	[nnc] Add LoopNest::getLoopAt to retrieve a specified inner For-stmt (#60569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29337767 Pulled By: huiguoo fbshipit-source-id: e3ae23c1b290739c03d1fa5d7da25de878eb1d4c	2021-06-23 15:53:29 -07:00
Hui Guo	c0d08dc10f	[NNC] Add tile transformation in loopnest (fixed #52785 ) (#57758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57758 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28260744 Pulled By: huiguoo fbshipit-source-id: 6b5591850aaf46455bf3c2d776fa930654839a63	2021-06-23 15:52:19 -07:00
Yi Wang	aeea5bf4a1	[Model Averaging] Provide a util function for model averaging (#60303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60303 The util function can be used for averaging parameters. More optimizations can be done in the future. ghstack-source-id: 132214212 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_average_parameters Reviewed By: rohan-varma Differential Revision: D29242806 fbshipit-source-id: 76fb5a92adb4bdc6151a9f411e366a0ed2a31f47	2021-06-23 15:41:15 -07:00
Andrew Gu	b770c4b61a	Fix ZeRO sort to be by numel (#60556 ) Summary: Overview: This is a follow-up to [this PR](https://github.com/pytorch/pytorch/pull/59586) and corrects the ZeRO partitioning algorithm to sort by the number of elements in the tensor rather than the size of the first dimension. As context, that PR was meant to migrate from using a _naive greedy_ algorithm to a _sorted-greedy_ algorithm when partitioning parameters in ZeRO. Updated Results: The updated table for the partitions can be found [here](https://github.com/pytorch/pytorch/pull/59410#issuecomment-865203219). There, I also considered a third algorithm (sometimes known as multifit), which is more computationally expensive than the greedy and sorted-greedy algorithms but cannot perform worse. However, because of its increased complexity and lack of improved results, I chose to settle with the simpler sorted-greedy algorithm. The `step()` latencies show slight improvements, but the improvements may be in the noise. The values below are in seconds and were generated using NCCL backend (unlike in the previous PR which used Gloo): Two processes: \| Model \| Max `optimizer.step()` Time - Greedy (Std.) \| Max `optimizer.step()` Time - Sorted-Greedy (Std.) \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.047 (0.00142) \| 0.044 (0.00025) \| \| ResNet-152 \| 0.057 (0.00034) \| 0.054 (0.00022) \| \| BERT \| 0.021 (0.00008) \| 0.020 (0.00008) \| Four processes: \| Model \| Max `optimizer.step()` Time - Greedy \| Max `optimizer.step()` Time - Sorted-Greedy (Std.) \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.019 (0.00065) \| 0.013 (0.00040) \| \| ResNet-152 \| 0.045 (0.00024) \| 0.045 (0.00025) \| \| BERT \| 0.019 (0.00022) \| 0.018 (0.00016) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/60556 Test Plan: I verified that the ZeRO tests pass (via the AI AWS cluster): ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` Reviewed By: VitalyFedyunin Differential Revision: D29335260 Pulled By: andwgu fbshipit-source-id: 469d1c6e029b77c1b300a94cd1fd94b633cd28dd	2021-06-23 15:22:36 -07:00
Jane Xu	1054ad5af3	Add back smoke tests for windows shard 1 for CircleCI (#60571 ) Summary: The reason I removed the smoke tests here were because we didn't have gflags on our GHA runners and we wanted to get sharding done sooner rather than later. However, we shouldn't remove these tests for windows as they are important for debugging linker issues with torch. Thus, this is step 1 in adding the tests back. Next step: - add gflags to base ami - remove the exist check Pull Request resolved: https://github.com/pytorch/pytorch/pull/60571 Test Plan: CI shouldn't break Reviewed By: walterddr Differential Revision: D29341850 Pulled By: janeyx99 fbshipit-source-id: 7e0c98887534d096f867e28a5482b32aa493b132	2021-06-23 14:52:14 -07:00
driazati	555c154df5	Use asyncio in tools/clang_tidy.py (#60495 ) Summary: This replaces Ninja for parallel builds with asyncio which is more idiomatic Python + easier to debug when things go wrong since the data never leaves Python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60495 Reviewed By: bhosmer Differential Revision: D29315526 Pulled By: driazati fbshipit-source-id: 196b1807fe4ee6db432d5fef146e52f96939b44d	2021-06-23 14:18:03 -07:00
Eli Uriegas	2dedd96dd2	cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493 TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR instead <details> <summary>Logs for builds with torchaudio</summary> ``` -- Building version 0.10.0a0+9e36281 running bdist_wheel running build running build_py copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio running build_ext -- Configuring done -- Generating done -- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 [1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc [2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc [3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc [4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc [5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc [6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc [7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp [8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable] 814 \| int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_; \| ^~~~~~~~~~~~~~~~~ ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare] 1504 \| if (normalization_stats_.size() <= frame) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ [9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && : [10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib: /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so third_party/kaldi/libkaldi.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed /usr/local/lib/libbreakpad_client.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so -lpthread -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && : [10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake -- Install configuration: "Release" -- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so -- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to "" installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio creating build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance creating build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets creating build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal creating build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend creating build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension creating build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects creating build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils creating build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional creating build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio running install_egg_info running egg_info writing torchaudio.egg-info/PKG-INFO writing dependency_links to torchaudio.egg-info/dependency_links.txt writing requirements to torchaudio.egg-info/requires.txt writing top-level names to torchaudio.egg-info/top_level.txt reading manifest file 'torchaudio.egg-info/SOURCES.txt' writing manifest file 'torchaudio.egg-info/SOURCES.txt' Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info running install_scripts adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'torchaudio/__init__.py' adding 'torchaudio/_torchaudio.so' adding 'torchaudio/kaldi_io.py' adding 'torchaudio/transforms.py' adding 'torchaudio/version.py' adding 'torchaudio/_internal/__init__.py' adding 'torchaudio/_internal/fft.py' adding 'torchaudio/_internal/module_utils.py' adding 'torchaudio/backend/__init__.py' adding 'torchaudio/backend/common.py' adding 'torchaudio/backend/no_backend.py' adding 'torchaudio/backend/soundfile_backend.py' adding 'torchaudio/backend/sox_io_backend.py' adding 'torchaudio/backend/utils.py' adding 'torchaudio/compliance/__init__.py' adding 'torchaudio/compliance/kaldi.py' adding 'torchaudio/datasets/__init__.py' adding 'torchaudio/datasets/cmuarctic.py' adding 'torchaudio/datasets/commonvoice.py' adding 'torchaudio/datasets/gtzan.py' adding 'torchaudio/datasets/librispeech.py' adding 'torchaudio/datasets/libritts.py' adding 'torchaudio/datasets/ljspeech.py' adding 'torchaudio/datasets/speechcommands.py' adding 'torchaudio/datasets/tedlium.py' adding 'torchaudio/datasets/utils.py' adding 'torchaudio/datasets/vctk.py' adding 'torchaudio/datasets/yesno.py' adding 'torchaudio/extension/__init__.py' adding 'torchaudio/extension/extension.py' adding 'torchaudio/functional/__init__.py' adding 'torchaudio/functional/filtering.py' adding 'torchaudio/functional/functional.py' adding 'torchaudio/models/__init__.py' adding 'torchaudio/models/conv_tasnet.py' adding 'torchaudio/models/deepspeech.py' adding 'torchaudio/models/wav2letter.py' adding 'torchaudio/models/wavernn.py' adding 'torchaudio/models/wav2vec2/__init__.py' adding 'torchaudio/models/wav2vec2/components.py' adding 'torchaudio/models/wav2vec2/model.py' adding 'torchaudio/models/wav2vec2/utils/__init__.py' adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py' adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py' adding 'torchaudio/prototype/__init__.py' adding 'torchaudio/prototype/rnnt_loss.py' adding 'torchaudio/sox_effects/__init__.py' adding 'torchaudio/sox_effects/sox_effects.py' adding 'torchaudio/utils/__init__.py' adding 'torchaudio/utils/sox_utils.py' adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE' adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA' adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL' adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt' adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel ``` </details> Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29316372 Pulled By: seemethere fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed	2021-06-23 14:08:19 -07:00
Richard Barnes	ad1041576a	Fix loop types (#60504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60504 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29313197 fbshipit-source-id: bc86622b587e4fdb73431c2ff27300404c9693ae	2021-06-23 13:26:22 -07:00
Thomas J. Fan	da030c59e7	ENH Adds Byte support for nll_loss (CPU) (#60308 ) Summary: Addresses a part of https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`. CUDA support will be implemented when `nll_loss` migration to CUDA is completed in https://github.com/pytorch/pytorch/pull/60299 and https://github.com/pytorch/pytorch/pull/60097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60308 Reviewed By: VitalyFedyunin Differential Revision: D29329458 Pulled By: jbschlosser fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47	2021-06-23 12:16:45 -07:00
Natalia Gimelshein	7bf195f360	fix kernel launch check in cross kernel Summary: per title Test Plan: buck test mode/opt //caffe2/test:kernel_launch_checks -- --exact 'caffe2/test:kernel_launch_checks - test_check_cuda_launches (test_kernel_launch_checks.AlwaysCheckCudaLaunchTest)' --run-disabled Reviewed By: r-barnes Differential Revision: D29335739 fbshipit-source-id: 385c66b1806886deba35f7fd83e29e0885999119	2021-06-23 11:47:50 -07:00
Yuxin Chen	308d238377	add SequenceMask op (#60235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60235 This diff - added SequenceMask op in Dper3 (caffe2 & pytorch) - added shape inference functions for SequenceMask op Test Plan: ``` buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_sequence_mask ``` Differential Revision: D29210097 fbshipit-source-id: cab3460e0fd6c49bec6d0c5c624bd4652de7604b	2021-06-23 11:33:00 -07:00
Rong Rong (AI Infra)	e60f9cfc58	Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications Test Plan: revert-hammer Differential Revision: D29135358 (`3de79b7757`) Original commit changeset: 2d0005672904 fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f	2021-06-23 11:23:24 -07:00
Peter Bell	03ab5b72c9	Fix parallel tbb build (#60532 ) Summary: Small typo in https://github.com/pytorch/pytorch/issues/60183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60532 Reviewed By: walterddr Differential Revision: D29336173 Pulled By: ngimel fbshipit-source-id: 57d753f21d484bbae26a23cb3eb35e497e25118a	2021-06-23 11:16:36 -07:00
Pritam Damania	bea83e2e46	Add `NoChunk` wrapper for pipeline args. (#57325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57325 As per the design outlined in https://github.com/pytorch/pytorch/issues/53952, adding a `NoChunk` wrapper for pipeline parallelism inputs. If a Tensor is wrapped with this wrapper, the pipeline implementation does not split this Tensor across micro-batches and instead just replicates this tensor as-is similar to non-tensors. ghstack-source-id: 132009305 Test Plan: 1) unit tests. 2) waitforbuildbot. Reviewed By: SciPioneer Differential Revision: D28109277 fbshipit-source-id: ee78c814c715d207d2796aba40b756a8e1834898	2021-06-23 11:13:14 -07:00
Jane Xu	6385621003	Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425 ) Summary: This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables. This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as: - in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`) - in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test` - in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409 I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there! Next steps: - Consolidate more CIRCLE_* references, maybe into CI_* equivalents? - We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME. - We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing. Notes: - I did not replace CIRCLE_JOB references in third_party directories - Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425 Reviewed By: seemethere, samestep Differential Revision: D29333882 Pulled By: janeyx99 fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748	2021-06-23 11:11:21 -07:00
Howard Huang	ff3678eec2	Disable group group backend rpc tests from running on CI (#60407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29278179 Pulled By: H-Huang fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a	2021-06-23 10:58:31 -07:00
Pritam Damania	109f831409	Support non-Tensor args in the Pipe API (#57226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57226 As per the design outlined in https://github.com/pytorch/pytorch/issues/53952, this PR adds support for non-Tensor args in the pipeline. The `NoChunk` wrapper hasn't been implemented yet and will be implemented in a follow up PR. ghstack-source-id: 132008356 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28083564 fbshipit-source-id: 5f09da238eec0167feff76fe98916dedb0a9ae4e	2021-06-23 10:53:37 -07:00
Bert Maher	10e11dbdcd	Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550 Original commit changeset: ed655497a981 Whatever gcc version OSS Bazel uses wasn't happy move-constructing the SimpleIREvaluator, so use a unique_ptr instead. Test Plan: CI. Hope that the gcc version used by OSS Bazel build is happier with this (it should be), since actually testing it locally is an intractable pain. Reviewed By: navahgar Differential Revision: D29333116 fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d	2021-06-23 10:50:03 -07:00
Yukio Siraichi	5fd45b8089	Port `any` kernel to structured kernels. (#60361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60361 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265859 Pulled By: ezyang fbshipit-source-id: 0cca0431569f38a168473b5cc572ced473799961	2021-06-23 10:44:24 -07:00
Yukio Siraichi	a5aa940f5e	Port `all` kernel to structured kernels. (#60360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60360 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265856 Pulled By: ezyang fbshipit-source-id: 6e9b45ad3fc3852bb142ae2e3d58fc5d0a911aed	2021-06-23 10:43:25 -07:00
Nikita Shulga	7b2d375148	Fix convolution_depthwise3x3_winograd for multichannel output (#60460 ) Summary: Before this change it was implemented with the assumption, that number of groups, input and output channels are the same, which is not always the case Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1) Fixes https://github.com/pytorch/pytorch/issues/60176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460 Reviewed By: albanD Differential Revision: D29299693 Pulled By: malfet fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0	2021-06-23 10:38:14 -07:00
Jane Xu	c63a0d0cfe	Adding windows CUDA smoke tests on PRs (#59686 ) Summary: Adding windows CUDA smoke tests on PRs (master should run the full suite). Next step: - Automate data update so we get a new smoke test list without manual effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686 Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation Reviewed By: walterddr Differential Revision: D29243533 Pulled By: janeyx99 fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b	2021-06-23 10:13:50 -07:00
Rohan Varma	8162439cbd	[DDP] Remove python GradBucket construction (#60301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60301 `GradBucket` is not meant to be constructed by Python user, only consumed as part of comm. hook ghstack-source-id: 131860243 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29239320 fbshipit-source-id: f1631a16e7d66b7e4a9b4a44698e2319005d10b2	2021-06-23 10:05:34 -07:00
Ilqar Ramazanli	e8690dacb2	To add Nesterov Adam Algorithm to Optimizers (#59009 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/5804 In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms. It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea. In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well: `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009 Reviewed By: gchanan, vincentqb Differential Revision: D29220375 Pulled By: iramazanli fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa	2021-06-23 08:21:43 -07:00
Kevin Tse	a2525b035c	Remove unused sample input argument from functions to resolve issue #55737 (#60486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60486 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29311875 Pulled By: NivekT fbshipit-source-id: 4bf451c4f8e78290398e0514860a14a335a51fa7	2021-06-23 08:02:04 -07:00
johnlu	265f0e5321	Add device runtime API for the plug-in to register platform python module into torch (#59857 ) Summary: ## Motivation Allow the out-of-tree Pytorch plug-in, for the device type other than CUDA, to add the runtime interface to the `torch` module. The runtime interface of the device can be referred with the device type name in the `torch` module. I.E., `torch.cuda` or `torch.xpu`. ## Solution - Add a register interface for the plug-in to add the platform python module into `torch` module with the device type name. I.E., The `torch.xpu` can be used to refer the XPU runtime interface after the XPU runtime module is registered with `torch._register_device_module('xpu', xpu_module)` in Intel's XPU plug-in. ## Additional Context More details about runtime has been discussed in https://github.com/pytorch/pytorch/issues/53707. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59857 Reviewed By: mrshenli Differential Revision: D29309320 Pulled By: ezyang fbshipit-source-id: b9802a5f937ddef9e0bdaf2f7692dfe463912fbe	2021-06-23 07:54:45 -07:00
Alexander Grund	c97d4d5a34	Fix test failures with some glibc libraries (#60450 ) Summary: Large complex values lead to nan/inf results when using some glibc implementations of atanh/acos - Skip test_reference_numerics_hard instead of "normal" - Test the edge values only for cdouble where the stdlib/glibc implementations support those large values Fixes https://github.com/pytorch/pytorch/issues/60259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60450 Reviewed By: mrshenli Differential Revision: D29304834 Pulled By: ezyang fbshipit-source-id: d6b97456847c5573b9d2cb447bfc62abba73cb2a	2021-06-23 07:49:27 -07:00
Andrew Gu	f0e4e4be72	Clean Up ZeRO (#60285 ) Summary: Overview: Being relatively new to PyTorch and ZeRO, I found parts of the code slightly hard to follow. This change strives to clean up the `ZeroRedundancyOptimizer` code in `zero_redundancy_optimizer.py` by reorganizing some computations, making variable names more explicit and consistent, and unifying terminology in the documentation. The goal is for the code to be easier to extend afterwards. Changes: 1) `state_dict()`: The [logic](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L510)`) for updating the global `state_dict` with each rank's local `state_dict` is simplified and made more explicit. Notably, the `dict` [`local_index_to_param_id`](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L513)`) is unneeded. It maps `local_pg["params"][i]` to `id(global_pg["params"][i])`, so it is equivalent to make a single pass over both lists in tandem, effectively iterating over `i`, without a need for the explicit `dict`. 2) `_update_trainable()`: The function [initializes](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L597)`) the local optimizer if it does not exist. I am unaware of any reason for the local optimizer to be destroyed after initialization, so I moved that logic to its own function `_init_local_optimizer()`, which is called once in the constructor. After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654706728), I removed the function `_update_trainable()` itself in favor of adding a check for `parameters_as_bucket_view` in `build_param_buckets()` directly. 3) `rank_local_state_dict()`: This [function](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L528)`) is currently broken. It appears to be legacy and relies on the input `state_dict` to have the key `"partitions"`. For now, I have removed it and added an [issue](https://github.com/pytorch/pytorch/issues/60284). Is it a notable use case to want to access another rank's `state_dict` in particular (as opposed to consolidating the entire state and then accessing)? 4) `local_state_dict():` After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r655571043), I removed the function. 5) `partition_parameters()`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654708183), I renamed the function to `_partition_parameters()` to mark it as private. 6) `_param_to_index`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654828100), I changed the key to be the parameter itself rather than its integer ID. 7) `buckets`: I renamed the data structure to `_buckets` to mark it as private. 8) Terminology: I tried to reduce the set of terms being used instead of juggling a number of synonyms. In particular, I made an effort to distinguish between "local" and "global" and to make names more indicative of typing. 9) Style: Per the [PyTorch contributing guide](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation), I made all docstrings abide by the 80 character limit, except for the one [line](`554891f6fa/torch/distributed/optim/zero_redundancy_optimizer.py (L142)`) showing the example ZeRO usage. Some code lines violate the limit for readability. Also, I unified some of the minor stylistic usages out of habit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60285 Test Plan: The test suite passes as expected (on the AI AWS cluster): ``` gpurun python test/distributed/optim/test_zero_redundancy_optimizer.py ``` I visually inspected the generated HTML doc (as generated following [this](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation)). Reviewed By: mrshenli Differential Revision: D29320726 Pulled By: andwgu fbshipit-source-id: 23f69a19ecc5e877a38fe1df0da11329428311dd	2021-06-23 07:21:40 -07:00
Michael Carilli	56481f9762	Ensure proper syncs for out-of-place grad creation (torch.autograd.grad) when backward ops run on side streams (#60127 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59844. Streaming backwards collects "leaf streams" for AccumulateGrad functions that stash or accumulate .grad attributes for autograd leaf tensors, and syncs those streams with some ambient stream(s) so later ops can safely consume the grads on the ambient stream(s). But, currently, streaming backwards does not collect leaf streams for grads produced out-of-place (ie, not stashed onto a .grad attribute) by `torch.autograd.grad`, because these out-of-place grads are "captured" and returned before they reach an AccumulateGrad function. Some out-of-place grads might not even have an AccumulateGrad function to go to, because `torch.autograd.grad` can be told to make grads for non-leaf temporaries.[1] The upshot is, when streaming backwards makes ops that produce out-of-place gradients run on side streams, no ambient stream is told to sync on these side streams, so `torch.autograd.grad` doesn't offer the same post-call safe-use guarantees for grads as the leaf accumulation of `torch.autograd.backward`. This PR ensures `torch.autograd.grad` gives the same safe-use guarantees as `torch.autograd.backward` by also stashing leaf streams for grads created out-of-place. I augmented a streaming backwards test to include a torch.autograd.grad attempt. The test fails on current master[2] and passes with the engine.cpp diffs. I have no idea if this bug or its fix matter to distributed autograd. pritamdamania mrshenli should take a look before it's merged. [1] example: ```python leaf = torch.tensor(..., requires_grad=True) tmp = leaf * 2 loss = tmp.sum() torch.autograd.grad(loss, inputs=(tmp, leaf)) ``` Technically, because `torch.autograd.grad` can be told to produce grads for non-leaf temporaries, these streams might NOT be "leaf streams". Maybe I should rename `leaf_streams`? [2] the way the test currently fails is fun: it reports ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 0 element(s) (out of 25) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0 (5.0 vs. 5.0), which occurred at index (0, 0). ``` I suspect this [kafka trap](https://en.wiktionary.org/wiki/Kafkatrap) happens because assertEqual does a comparison test on the device, syncs on some bool result, sees failure and prints the tensors post-sync at which point is IS safe to access the values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60127 Reviewed By: mrshenli Differential Revision: D29276581 Pulled By: albanD fbshipit-source-id: a9f797e2fd76e2f884cce5a32ecf5d9b704c88ee	2021-06-23 07:14:01 -07:00
Anjali Chourdia	b14f19b6fe	Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum Test Plan: revert-hammer Differential Revision: D29190420 (`21479ad20c`) Original commit changeset: 86246df82098 fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184	2021-06-23 06:59:37 -07:00
Ilqar Ramazanli	90cd57ee16	To add edge_order=2 and documentation for gradient operator (#58165 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56036 Fixes https://github.com/pytorch/pytorch/issues/56130 * All the interior points are computed using second order accurate central differences method for gradient operator. However, currently we only have first order method computation for edge points. In this PR we are adding second order methods for edge points as well. * Currently, there is no detailed description of how gradient operator computed using second order method, and how to use parameters correctly. We add detailed explanation of meaning of each parameter, and return of the gradient operator, meanwhile giving description of the second-order computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58165 Reviewed By: mruberry Differential Revision: D29305321 Pulled By: iramazanli fbshipit-source-id: 0e0e418eed801c8510b8babe2ad3d064479fb4d6	2021-06-23 03:35:15 -07:00
Jordan Fix	7ed07e2a7d	[NormalizeArgs] Retain node.meta (#60449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60449 After normalizing args, still retain each node's `meta` Test Plan: Added unit test. Reviewed By: gcatron Differential Revision: D29293179 fbshipit-source-id: 432b409790041fa4d6e759f7b46a8bee363497b0	2021-06-23 03:31:53 -07:00
Peter Bell	66452e0a8c	Ensure num_threads is initialized before calling omp_get_max_threads (#60185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60185 `get_num_threads` is usually called before `parallel_for` so there's no guaruntee we've initialized `num_threads` properly. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29287814 Pulled By: ngimel fbshipit-source-id: 7e9c86fc32d63889a57a9b1d2b7d8f3863481dce	2021-06-23 01:18:24 -07:00
Peter Bell	19553438ed	OpenMP: Refactor parallel_reduce to share code with parallel_for (#60184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60184 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29287817 Pulled By: ngimel fbshipit-source-id: 734a33a8d965208662989e2497b345b68c132498	2021-06-23 01:18:22 -07:00
Peter Bell	c75714e594	Ensure thread id is valid in nested parallel regions (#60183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60183 Fixes https://github.com/pytorch/pytorch/pull/59149#issuecomment-863287331 `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29287816 Pulled By: ngimel fbshipit-source-id: 777f771a0900750c7f22eb1dd185d84d19282108	2021-06-23 01:17:09 -07:00
Peter Bell	3f3fd57044	Migrate crossKernel from THC to ATen (CUDA) (#60039 ) Summary: Ref https://github.com/pytorch/pytorch/issues/24507 (There doesn't seem to be an actual issue for cross) This also moves the remaining operator functors in `THCTensorMathPointwise.cuh` to `SparseCUDATensorMath.cu` which is the only file using them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60039 Reviewed By: mrshenli Differential Revision: D29314638 Pulled By: ngimel fbshipit-source-id: aa7b57f6e11a933fb44f044e26945bb4a9e3de5f	2021-06-23 00:37:55 -07:00
Nikita Shulga	f590cceacb	[BE] Fix Convolution.cpp build warnings (#60463 ) Summary: Use `c10::irange` and `auto` to get rid of narrowing cast and signed-unsigned compilation warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/60463 Reviewed By: samestep Differential Revision: D29300415 Pulled By: malfet fbshipit-source-id: 4d7f519e2e3ebaa754364f60af762658c1b4a62e	2021-06-23 00:02:33 -07:00
Alexander Grund	3846cef2d7	Increase tolerance for test_grad_scaling_clipping (#60458 ) Summary: This makes it pass on A100 and with e.g. torch.manual_seed(6) called before running this test. Fixes https://github.com/pytorch/pytorch/issues/60455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60458 Reviewed By: mrshenli Differential Revision: D29309618 Pulled By: ngimel fbshipit-source-id: 72584087bcc949f7bc96b0644b701e69ae1fa025	2021-06-22 23:43:25 -07:00
Eddie Yan	40de03fc55	`topk` on CUDA supports `bfloat16` (#59977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56176 via https://github.com/pytorch/pytorch/issues/58196 CC zasdfgbnm ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/59977 Reviewed By: mrshenli Differential Revision: D29315018 Pulled By: ngimel fbshipit-source-id: 0a87e7f155a97225fc6b2ec5dc0dc38a23156b41	2021-06-22 23:39:24 -07:00
Bert Maher	21479ad20c	[nnc][tests] Tests and benchmarks for computeSum (#60160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160 Adds a few simple tests and benchmarks for the `computeSum` op (equivalent to `at::sum`). The benchmarks test 1D reduction and 2D row and column reduction. Performance is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases, and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13). Results (on my skylake-avx512, with turbo disabled): ``` ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ Reduce1D/Torch/16777216 4746995 ns 4746722 ns 150 BYTES=14.1379G/s Reduce1D/Naive/16777216 34063215 ns 34061388 ns 21 BYTES=1.97023G/s Reduce1D/NativeRfactor/16777216 5057175 ns 5057167 ns 139 BYTES=13.2701G/s Reduce1D/TeNaive/16777216 33868945 ns 33868851 ns 21 BYTES=1.98143G/s Reduce1D/TeSplitTail/16777216 33902786 ns 33900436 ns 21 BYTES=1.97959G/s Reduce1D/TeSplitMask/16777216 33922509 ns 33920604 ns 21 BYTES=1.97841G/s Reduce1D/TeRfactorV1/16777216 5141150 ns 5141002 ns 135 BYTES=13.0537G/s Reduce1D/Op/16777216 5140390 ns 5140091 ns 135 BYTES=13.056G/s Reduce2DCol/Torch/8/2097152 12824403 ns 12823563 ns 55 BYTES=5.8874G/s Reduce2DCol/Torch/64/262144 8306873 ns 8306743 ns 83 BYTES=8.20507G/s Reduce2DCol/Torch/4096/4096 7992364 ns 7992239 ns 87 BYTES=8.3988G/s Reduce2DCol/OpSchedule/8/2097152/0 4866144 ns 4865766 ns 138 BYTES=15.5161G/s Reduce2DCol/OpSchedule/64/262144/0 36668978 ns 36666415 ns 19 BYTES=1.85885G/s Reduce2DCol/OpSchedule/4096/4096/0 155862459 ns 155801266 ns 4 BYTES=430.839M/s Reduce2DCol/OpSchedule/8/2097152/1 8067683 ns 8061117 ns 85 BYTES=9.36563G/s Reduce2DCol/OpSchedule/64/262144/1 7496686 ns 7496562 ns 93 BYTES=9.09183G/s Reduce2DCol/OpSchedule/4096/4096/1 5262821 ns 5262186 ns 131 BYTES=12.7562G/s Reduce2DCol/OpSchedule/8/2097152/2 6237899 ns 6237210 ns 109 BYTES=12.1044G/s Reduce2DCol/OpSchedule/64/262144/2 5258012 ns 5257655 ns 127 BYTES=12.9635G/s Reduce2DCol/OpSchedule/4096/4096/2 5231686 ns 5228241 ns 132 BYTES=12.839G/s Reduce2DCol/OpSchedule/8/2097152/3 11088573 ns 11087557 ns 62 BYTES=6.80921G/s Reduce2DCol/OpSchedule/64/262144/3 5338843 ns 5338326 ns 127 BYTES=12.7676G/s Reduce2DCol/OpSchedule/4096/4096/3 4311617 ns 4308102 ns 162 BYTES=15.5812G/s Reduce2DRow/Torch/8/2097152 4642244 ns 4641794 ns 151 BYTES=14.4575G/s Reduce2DRow/Torch/64/262144 4628311 ns 4628245 ns 151 BYTES=14.4999G/s Reduce2DRow/Torch/4096/4096 4894012 ns 4893316 ns 143 BYTES=13.7177G/s Reduce2DRow/Torch/262144/64 10469098 ns 10468027 ns 68 BYTES=6.51101G/s Reduce2DRow/Hand/262144/64 5554380 ns 5554059 ns 126 BYTES=12.2716G/s Reduce2DRow/OpSchedule/8/2097152/0 33890363 ns 33888931 ns 21 BYTES=1.98026G/s Reduce2DRow/OpSchedule/64/262144/0 33901317 ns 33899436 ns 21 BYTES=1.97965G/s Reduce2DRow/OpSchedule/4096/4096/0 33500358 ns 33498815 ns 21 BYTES=2.00381G/s Reduce2DRow/OpSchedule/262144/64/0 13132231 ns 13131049 ns 53 BYTES=5.19056G/s Reduce2DRow/OpSchedule/8/2097152/1 5200423 ns 5200025 ns 134 BYTES=12.9055G/s Reduce2DRow/OpSchedule/64/262144/1 5204428 ns 5204327 ns 133 BYTES=12.8949G/s Reduce2DRow/OpSchedule/4096/4096/1 8724355 ns 8723370 ns 80 BYTES=7.69488G/s Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns 1 BYTES=37.6279M/s Reduce2DRow/OpSchedule/8/2097152/2 9169829 ns 9168946 ns 76 BYTES=7.31915G/s Reduce2DRow/OpSchedule/64/262144/2 9159901 ns 9158560 ns 76 BYTES=7.32747G/s Reduce2DRow/OpSchedule/4096/4096/2 9217398 ns 9215557 ns 76 BYTES=7.28391G/s Reduce2DRow/OpSchedule/262144/64/2 10820450 ns 10818998 ns 66 BYTES=6.29979G/s Reduce2DRow/OpSchedule/8/2097152/3 5227921 ns 5226544 ns 133 BYTES=12.84G/s Reduce2DRow/OpSchedule/64/262144/3 5194362 ns 5194082 ns 133 BYTES=12.9203G/s Reduce2DRow/OpSchedule/4096/4096/3 5196080 ns 5195349 ns 134 BYTES=12.9203G/s Reduce2DRow/OpSchedule/262144/64/3 5235189 ns 5234728 ns 133 BYTES=13.0202G/s ``` ghstack-source-id: 131753875 Test Plan: these tests Reviewed By: navahgar Differential Revision: D29190420 fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0	2021-06-22 23:04:09 -07:00
Bert Maher	fbeb8b4992	[nnc] Speed up batchnorm benchmark Summary: Use better scheduling: fuse and parallelize NC, fuse and vectorize HW. ``` ----------------------------------------------- N/C/H/W ATen NNC ----------------------------------------------- 1/64/112/112 45449 ns 36672 ns 1/256/14/14 15555 ns 7116 ns 1/128/28/28 15737 ns 8560 ns 1/64/56/56 20766 ns 12153 ns 1/512/7/7 16985 ns 8182 ns 5/64/112/112 2532475 ns 2069668 ns 5/256/14/14 24507 ns 12228 ns 5/128/28/28 29352 ns 20146 ns 5/64/56/56 44786 ns 38784 ns 5/512/7/7 22307 ns 20505 ns ``` Test Plan: benchmark results above Reviewed By: navahgar Differential Revision: D29288658 fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5	2021-06-22 22:57:43 -07:00
Jiakai Liu	b0c9762e2d	[pytorch][nnc] external function call to xnnpack ops (#59525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525 This PR added NNC external function call binding for two XNNPack ops: - prepacked::linear_clamp_run - prepacked::conv2d_clamp_run Both ops take two arguments: a regular input tensor and a prepacked context object that contains other parameters like weights/bias/etc. The prepacked context object's type is a custom class. NNC doesn't generate assembly code that reads the content of the prepacked object directly. It simply passes it into the XNNPack ops wrapper, so both NNC and the generated assembly code don't need to know the custom class type. At compilation time, we use a size-1 dummy tensor as the placeholder for the prepacked XNNPack context object. At runtime, we pass in the raw pointer of the XNNPack context object as if it were a regular tensor storage data pointer. Inside the external function call wrapper, we reinterpret_cast the raw pointer back to the custom class type before dispatching to the XNNPack ops. ghstack-source-id: 132135512 Test Plan: unit test Reviewed By: bertmaher Differential Revision: D28924934 fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4	2021-06-22 21:29:31 -07:00
Ilqar Ramazanli	79dc500a99	Add error message for sequence length to be equal to 0 case for RNNs (#60269 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/50192 It has been discussed in the issue that, currently RNN apis do not support inputs with `seq_len=0` and the error message does not reflect this issue clearly. This PR is suggesting a solution to this issue, by adding a more clear error message that, none of RNN api (nn.RNN, nn.GRU and nn.LSTM) do not support `seq_len=0` for neither one-directional nor bi-directional layers. ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.GRU(input_size, hidden_size) for seq_len in reversed(range(4)): output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) print('{}, {}'.format(output.shape, h_n.shape)) ``` Previously was giving output as : ``` torch.Size([3, 10, 6]), torch.Size([1, 10, 6]) torch.Size([2, 10, 6]), torch.Size([1, 10, 6]) torch.Size([1, 10, 6]), torch.Size([1, 10, 6]) Traceback (most recent call last): File "test.py", line 8, in <module> output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 739, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: stack expects a non-empty TensorList ``` However, after adding this PR, this error message change for any combination of [RNN, GRU and LSTM] x [one-directional, bi-directional]. Let's illustrate the change with the following code snippet: ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.LSTM(input_size, hidden_size, bidirectional=True) output, h_n = rnn(torch.zeros(0, 10, input_size)) ``` would give output as following: ``` Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/modules/module.py", line 1054, in _call_impl return forward_call(input, kwargs) File "/fsx/users/iramazanli/pytorch/torch/nn/modules/rnn.py", line 837, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: Expected sequence length to be larger than 0 in RNN ``` ********************************* The change for Packed Sequence didn't seem to be necessary because from the following code snippet error message looks clear about the issue: ``` import torch import torch.nn.utils.rnn as rnn_utils import torch.nn as nn packed = rnn_utils.pack_sequence([]) ``` returns: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 398, in pack_sequence return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted) File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 363, in pad_sequence return torch._C._nn.pad_sequence(sequences, batch_first, padding_value) RuntimeError: received an empty list of sequences ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60269 Reviewed By: mrshenli Differential Revision: D29299914 Pulled By: iramazanli fbshipit-source-id: 5ca98faa28d4e6a5a2f7600a30049de384a3b132	2021-06-22 21:25:05 -07:00
nikithamalgi	dc9aa7b960	Add custom code filter for TS (#60309 ) Summary: ----------- Adds custom code filter for Torchscript to include tracing of forward calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60309 Reviewed By: zhxchen17 Differential Revision: D29317150 Pulled By: nikithamalgifb fbshipit-source-id: d49e4dc74a2b8cc98b0d4967980d819908b7ea7b	2021-06-22 20:55:57 -07:00
Angela Yi	3de79b7757	[quant] Input-Weight Equaliaztion - convert modifications (#59963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963 When converting, before quantizing the nodes, we call `update_obs_for_equalization()` and `convert_eq_obs()`. `update_obs_for_equalization`: 1. For each InputEqualizationObserver, we find the corresponding WeightEqualizationObserver. 2. For nn.Linear layers, we will create an instance of the WeightEqualizationObserver, run forward on the observer with the given weights. 3. Calculate the equalization scale between the InputEqualizationObserver and WeightEqualizationObserver. `convert_eq_obs`: For every InputEqualizationObserver, we will do the following: 1. Create a node (ex. `x0_activation_post_process_scale`) containing the equalization scale constant. 2. Create another node containing a `mul` operator multiplying the equalization scale and the input. 3. Remove the current InputEqualizationObserver node, and replace it with the `mul` node. For every WeightEqualizationObserver, we will do the following: 1. Get the next equalization scale (we may need this for equalizing connected linear layers). 2. Scale the weights by multiplying it with the reciprocal of the current equalization scale and the next equalization scale Currently, this supports models with `nn.Linear` layers, but does not support connecting linear layers. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` .LinearModule( (linear): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0] %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135358 fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5	2021-06-22 20:43:30 -07:00
nikithamalgi	7589d9c58b	Enable rcb lookup for typing (#60413 ) Summary: ----------- For FX traced models, types from typing modules are not available during the lookup for the function to be traced. Because of which the resolving the type results to a None type object. By enabling lookup for `typing` module in `_jit_internal.py`, we can mitigate this issue with FX_Tracing and scripting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60413 Test Plan: -------- with-proxy python test/test_jit.py -k TestPDT.test_fx_tracing_with_typing Reviewed By: bhosmer Differential Revision: D29314531 Pulled By: nikithamalgifb fbshipit-source-id: 1aa651430b1074c7e6fa74ba02bbcc4e1b00b01b	2021-06-22 18:53:19 -07:00
Nuno Lopes	135e203e5e	avoid unnecessary copies in MultiDispatchKeySet (#60093 ) Summary: The code would previously pass Generator & optional<Tensor> by value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60093 Reviewed By: swolchok Differential Revision: D29310624 Pulled By: bhosmer fbshipit-source-id: fb4a9740a57ef319aaf7c778d51430907a7c0cc5	2021-06-22 18:44:06 -07:00
Supriya Rao	4887c6e401	[quant] avoid resize calls in observer/fake_quant (#60386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386 During QAT we sometimes encounter errors with scripted models `RuntimeError: cannot resize variables that require grad` For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29271905 fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b	2021-06-22 17:41:43 -07:00
Nikita Shulga	d3ae3e07aa	parse_reports() should include hidden files (#60404 ) Summary: Not sure why there are report files starting with `.`, but in that case `glob('*/.xml')` should not be used, as it will skip those Pull Request resolved: https://github.com/pytorch/pytorch/pull/60404 Reviewed By: samestep Differential Revision: D29276459 Pulled By: malfet fbshipit-source-id: 8e131c38013425ad786e0a9ca0c0a43e57b1679a	2021-06-22 15:53:00 -07:00
Richard Barnes	986a88056c	Remove some unused variables (#60411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60411 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221207 fbshipit-source-id: da6ad44036291a98f0b36b260062d077a7c2691b	2021-06-22 15:44:33 -07:00
Richard Barnes	36d4062a62	Fix some variable types (#60414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60414 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221183 fbshipit-source-id: f855efca2fd08844de65d0f9ef73bcceffee657e	2021-06-22 15:44:31 -07:00
Richard Barnes	7d779f84a3	Fix some loop types (#60415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60415 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221174 fbshipit-source-id: 9bc56655f198f6eb95e6b2e7a4f0573a2cd2f9a1	2021-06-22 15:43:10 -07:00
Sam Estep	6e926f1303	Fix lint (#60472 ) Summary: This PR fixes the `mypy` failure introduced by [`numpy` 1.21.0](https://github.com/numpy/numpy/releases/tag/v1.21.0) (by pinning `numpy` to 1.20, at least for now) and the `quick-checks` failure introduced by https://github.com/pytorch/pytorch/issues/60405. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60472 Test Plan: The Lint workflow in GitHub Actions. Reviewed By: walterddr Differential Revision: D29313009 Pulled By: driazati fbshipit-source-id: 53fd0e0549c26be5fc5d3c502c5891c56c83a32c	2021-06-22 14:48:07 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Edward Yang	82c52fd417	Do not wrap Tensor.{grad,_base} by default (#60464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60464 Fixes https://github.com/szagoruyko/pytorchviz/issues/65 An alternate implementation of this PR would be to remove the __torch_function__ interposition points for these accessors entirely. In the end, I decided to opt for extra expressivity. See torch.overrides for the criterion on how I decided which accessors should get the nowrap treatment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29302835 Pulled By: ezyang fbshipit-source-id: fbe0ac4530a6cc9d6759a3fdf5514d4d7b1f7690	2021-06-22 12:49:23 -07:00
Sam Estep	f42140cb8a	Disable warn_unused_ignores again (#60480 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/60006#issuecomment-866130657. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60480 Test Plan: Run `mypy --config mypy-strict.ini` with [`ruamel.yaml`](https://pypi.org/project/ruamel.yaml/) installed. Reviewed By: zhouzhuojie Differential Revision: D29307823 Pulled By: samestep fbshipit-source-id: 97fa4b7dad0465c269411c48142b22ce751bf830	2021-06-22 12:42:37 -07:00
Weiqiang Wu	6a87e8d087	Implement erfcx() (#58194 ) Summary: Implement erfcx() https://github.com/pytorch/pytorch/issues/31945 Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194 Reviewed By: ngimel Differential Revision: D29285979 Pulled By: mruberry fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399	2021-06-22 12:38:38 -07:00
Jeffrey Wan	b34965435d	Improve testing of inplace views (#59891 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing - Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming - Adds some tests in test_view_ops that verify basic behavior - Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases - Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue. - Update inference mode tests to also check in-place Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891 Reviewed By: albanD Differential Revision: D29272546 Pulled By: soulitzer fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6	2021-06-22 12:28:09 -07:00
Andrew Gallagher	20bda0057e	[caffe2/utils] Add explicit rule to avoid package boundary violation Summary: Add a rule to wrap proto_utils.h and depend on that, rather than relying on a glob which violates package boundaries. Reviewed By: igorsugak Differential Revision: D29273453 fbshipit-source-id: 08f198a03d06ee2fdf61f5dbe1d0087db22aec8b	2021-06-22 12:22:24 -07:00
Andrew Gallagher	7c1bca9e94	[caffe2/utils] Add explicit rule to avoid package boundary violation Summary: Add a rule to wrap simple_queue.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: igorsugak Differential Revision: D29273415 fbshipit-source-id: f2b62a82cd6478bd71a8194d661d1c8b023c0953	2021-06-22 12:21:08 -07:00
Michael Carilli	7f2592195d	Adds stream recording for cross-stream uses of gradients in streaming backward (#60230 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33909. I _think_ the two recordDataPtrOnStreams i added are necessary and sufficient. They're the ones that worked for dmitrivainbrand's intricate multistream pipelining in https://github.com/pytorch/pytorch/issues/33909 and I can more or less convince myself they're enough, but it's hard to be sure (and hard to test). PRing without a test now for visibility. I'll try to come up with something. input_buffer.cpp needs to compile in cuda or cpu-only builds, so I can't call `c10::cuda::CUDACachingAllocator::recordStream` directly. I planned to work around by adding a binding in VirtualGuardImpl but https://github.com/pytorch/pytorch/pull/57047 spared me the trouble, thanks lw . Recording a usage stream on a generic tensor was uglier than I expected, see https://github.com/pytorch/pytorch/issues/60306. Up to you guys if adding a unified way to record streams on a tensor backed by any TensorImpl should block this PR (and if so, whether it should happen in a separate PR or as part of this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60230 Reviewed By: mrshenli Differential Revision: D29289392 Pulled By: albanD fbshipit-source-id: 1339d382b7d238a461b082597b3962847b5201fe	2021-06-22 12:16:07 -07:00
Sam Estep	c7d0e9da0a	Add pyproject.toml (#60408 ) Summary: This makes PyTorch conform to [PEP 517](https://www.python.org/dev/peps/pep-0517/) and [PEP 518](https://www.python.org/dev/peps/pep-0518/) by explicitly stating that we use [`setuptools`](https://setuptools.readthedocs.io/). It also follows up on https://github.com/pytorch/pytorch/pull/60119#pullrequestreview-685791812 by moving our [`isort`](https://pycqa.github.io/isort/) config into the new `pyproject.toml` file. I didn't move any of our other tool configs into `pyproject.toml` in this PR because: - `.flake8` is assumed to exist in its current format for `tools/actions_local_runner.py` to work - `mypy.ini` is not our only `mypy` config - `pytest.ini` has detailed comments on `addopts` which [would have to be removed](https://github.com/toml-lang/toml/issues/340#issuecomment-122164501) in TOML because that setting is [a string, not an array](https://docs.pytest.org/en/6.2.x/customize.html#pyproject-toml) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60408 Reviewed By: 1ntEgr8 Differential Revision: D29277327 Pulled By: samestep fbshipit-source-id: 3f2e63f6cf9024f8c534cb13a0d854a75609c5ba	2021-06-22 12:12:36 -07:00
Sam Estep	1abf45e37f	Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers Test Plan: revert-hammer Differential Revision: D29241736 (`0d2a936176`) Original commit changeset: 288b9b1f3125 fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670	2021-06-22 12:08:31 -07:00
Thomas J. Fan	99ca2c5b4b	Migrates nll_loss_backward from TH to Aten (CUDA) (#60299 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24609 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 There are no performance differences when running the following benchmark: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: elapsed = 0 for i in range(n_runs): data = torch.randn(N, C, device=device, requires_grad=True) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) result = loss(input, target) if reduction == "none": gradient = torch.randn(N, device=device) else: gradient = torch.randn(1, device=device).squeeze() t1 = _time() result.backward(gradient) t2 = _time() elapsed = elapsed + (t2 - t1) elapsed_avg = elapsed / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"elapsed time is {elapsed_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none elapsed time is 0.19 (ms) input size(500000, 30), reduction: none elapsed time is 0.83 (ms) input size(1000000, 30), reduction: none elapsed time is 1.66 (ms) input size(100000, 30), reduction: mean elapsed time is 1.50 (ms) input size(500000, 30), reduction: mean elapsed time is 7.19 (ms) input size(1000000, 30), reduction: mean elapsed time is 14.35 (ms) input size(100000, 30), reduction: sum elapsed time is 1.49 (ms) input size(500000, 30), reduction: sum elapsed time is 7.17 (ms) input size(1000000, 30), reduction: sum elapsed time is 14.21 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none elapsed time is 0.19 (ms) input size(500000, 30), reduction: none elapsed time is 0.83 (ms) input size(1000000, 30), reduction: none elapsed time is 1.66 (ms) input size(100000, 30), reduction: mean elapsed time is 1.48 (ms) input size(500000, 30), reduction: mean elapsed time is 7.16 (ms) input size(1000000, 30), reduction: mean elapsed time is 14.29 (ms) input size(100000, 30), reduction: sum elapsed time is 1.49 (ms) input size(500000, 30), reduction: sum elapsed time is 7.15 (ms) input size(1000000, 30), reduction: sum elapsed time is 14.18 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60299 Reviewed By: albanD Differential Revision: D29287613 Pulled By: ngimel fbshipit-source-id: 21e15f2c518087e9fb797a379e1e0a3508c98509	2021-06-22 12:04:07 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fca931d181	List striding with arbitrary step size (#58537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58537 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28531721 Pulled By: tugsbayasgalan fbshipit-source-id: 8c8ed32ca00366603bfb5086e87dfa62736ff4b2	2021-06-22 11:25:23 -07:00
Kevin Tse	df8a8fbc1b	Improve code and documentation clarity for DataPipes APIs (#60423 ) Summary: Fixes issues that are discussed with ezyang in the comments of PR https://github.com/pytorch/pytorch/issues/59498 Improved code and documentation clarity, and refactored .filter to nesting_level directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/60423 Reviewed By: ezyang Differential Revision: D29281599 Pulled By: NivekT fbshipit-source-id: a9bbaf52f492db0741c00f3ceb4022b08ddb1506	2021-06-22 11:19:08 -07:00
Karen Zhou	71b83c27e2	[pruning] Move pruning directory into experimental folder (#60395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60395 Experimental folder so other developers know this is work in progress Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KGJD Reviewed By: z-a-f Differential Revision: D29272319 fbshipit-source-id: 93eeeceba0376753efc9a5bb69a155278ceb2fca	2021-06-22 11:08:48 -07:00
Karen Zhou	f75ea51e67	[pruning] Move pruning files to their own directory (#60293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60293 Move pruning files to their own directory Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KCfz Reviewed By: z-a-f Differential Revision: D29238159 fbshipit-source-id: 0173a278b39ff5ee4cbd54f333f558b6fe412be5	2021-06-22 11:08:47 -07:00
Karen Zhou	b25db5251a	[pruning] Base pruner class (#60278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60278 Implemented `PruningParametrization`, which removes pruned rows, and `BasePruner`, which is the base class for structured pruning. Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KC2n Reviewed By: z-a-f Differential Revision: D29208349 fbshipit-source-id: f34e8e258bf13fa80292c2bd64d56f5ad1e72b6a	2021-06-22 11:07:31 -07:00
Peter Bell	31a884987d	Remove some TH includes from ATen (#60323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29252862 Pulled By: ngimel fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936	2021-06-22 10:55:17 -07:00
Ilqar Ramazanli	0d2a936176	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448	2021-06-22 10:38:41 -07:00
Kshiteej K	0126f42841	[complex] `torch.sigmoid`: CUDA support and complex autograd support (#48647 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48552 Changes * Complex support for `torch.sigmoid` CUDA (CPU support already exists) * Complex autograd support for `torch.sigmoid` (CUDA and CPU) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48647 Reviewed By: H-Huang Differential Revision: D29163012 Pulled By: anjali411 fbshipit-source-id: 0cac0412355312675bee1cc46e090be7351d5dac	2021-06-22 10:35:00 -07:00
Winston Smith	567e6d3a87	Remove Caffe2 thread-pool leak warning (#60318 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b	2021-06-22 10:26:55 -07:00
Michael Dagitses	91451369ed	require non-empty inputs to grad() calls in the API (#52016 ) Summary: The grad() function needs to return the updated values, and hence needs a non-empty inputs to populate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016 Test Plan: Passes Python and C++ unit tests, and added new tests to catch this behavior. Fixes https://github.com/pytorch/pytorch/issues/47061 Reviewed By: albanD Differential Revision: D26406444 Pulled By: dagitses fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a	2021-06-22 10:10:58 -07:00
Saketh Are	729f7cd52f	Implement histogram operator on CPU (#58780 ) Summary: The existing [torch.histc](https://pytorch.org/docs/stable/generated/torch.histc.html) operator is limited in comparison to [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). This PR adds torch.histogram on CPU. The new operator replicates numpy.histogram's behavior, including support for caller-specified bin edges and weights. It was motivated by previous community requests for histogram. The implementation was [benchmarked](https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing) against numpy.histogram as well as torch.histc. This implementation is weakly faster than numpy.histogram across all types of inputs tested, and performs in line with torch.histc for the limited inputs histc supports. mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58780 Test Plan: Added unit tests, OpInfo for the new torch.histogram operator. Tested execution time on a variety of input sizes and compared to numpy.histogram performance: https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing Reviewed By: ezyang Differential Revision: D29134626 Pulled By: saketh-are fbshipit-source-id: f2773085de1697f6bc6ffdeffe9a81267f51bdfc	2021-06-22 10:06:04 -07:00
Patrick	3a56758e1f	changed launch bound to fix col2im kernel (#60315 ) Summary: Changed launch bound for col2im kernel from 1024 to 512 to fix register spilling into local memory. Perf comparison (using Nvidia Titan-V): ![Col2ImTimingData](https://user-images.githubusercontent.com/22803332/122627527-e0b1fc80-d064-11eb-83df-f2a1165cefcc.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60315 Reviewed By: albanD Differential Revision: D29288113 Pulled By: ngimel fbshipit-source-id: f78eb90941835700a1aef8e08fac6aff86dedfe9	2021-06-22 09:29:34 -07:00
Patrick	926bb5d6be	changed launch bounds, unrolled for loop for grid sampler 2d fwd and bwd (#60405 ) Summary: Changed launch bounds for grid sampler 2d fwd and bwd from 1024 to 256, added loop unrolling to fix register spilling into local memory. Timing Data: (using Nvidia Titan-V) Interpolation mode 2, padding 0, align corners False ![GridSampler2dTimingData](https://user-images.githubusercontent.com/22803332/122830305-01fd2d80-d29d-11eb-9cd3-7da533a03f33.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60405 Reviewed By: albanD Differential Revision: D29288075 Pulled By: ngimel fbshipit-source-id: 5e060f0c2d1cc0a3086718e6be263413dfa29689	2021-06-22 09:22:41 -07:00
Kevin Tse	23bb2ed00a	Improve documentation for torch.set_rng_state (#60422 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59974 by improving documentation for the function torch.set_rng_state Pull Request resolved: https://github.com/pytorch/pytorch/pull/60422 Test Plan: Only a comment is being changed. Reviewed By: bdhirsh Differential Revision: D29281578 Pulled By: NivekT fbshipit-source-id: 2c160f782438b7f91f16c44f06c342e8b8b8437b	2021-06-22 07:10:50 -07:00
Chen Lai	700df82881	[PyTorch Edge] Update iOS readme to use lite interpreter (#59841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59841 As lite interpreter moves to beta, it's recommended to let users start using it. ghstack-source-id: 131766778 Test Plan: CI Reviewed By: husthyc Differential Revision: D29048350 fbshipit-source-id: 54d2ad09b4e9475304522c80b358647bcea79b14	2021-06-22 02:17:04 -07:00
Mike Ruberry	15dc320cae	Fix lint build (#60438 ) Summary: per title Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/60438 Reviewed By: ngimel Differential Revision: D29288175 Pulled By: mruberry fbshipit-source-id: f59b579b1793fdb1d298109c2bef0a70badb37b4	2021-06-22 00:11:55 -07:00
Patrick	0585daae83	fixed launch bounds for gathertopk kernel (#60314 ) Summary: Changed launch bounds for gatherTopK kernel to fix register spilling into local memory. Comparison (Nvidia Titan-V GPU): Args: Input size as below, k=32, dim=None ![TopKTimingData](https://user-images.githubusercontent.com/22803332/122624922-46978780-d057-11eb-9b52-d5786da432c0.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60314 Reviewed By: mruberry Differential Revision: D29267789 Pulled By: ngimel fbshipit-source-id: 4056efb2e44e5527786167af66a127504980a3af	2021-06-21 22:24:44 -07:00
Peter Bell	45ae2e7863	Set TORCH_WARN_ONCE to always warn inside of assertNotWarn (#60020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60020 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29249909 Pulled By: mruberry fbshipit-source-id: 10a8d5c05bd8d4aec345f70b132efd3623601f6a	2021-06-21 21:35:54 -07:00
Peter Bell	5d476f5b95	Fix FFT documentation examples and run doctests in the test suite (#60304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59514 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60304 Reviewed By: anjali411 Differential Revision: D29253980 Pulled By: mruberry fbshipit-source-id: 0654f00197e5fae338aa8edf0b61ef5692cdaa7e	2021-06-21 20:47:25 -07:00
Rong Rong (AI Infra)	5921b5480a	ensure xml report path are relative to */pytorch/test (#60380 ) Summary: Changes the approach. Root cause of this is for some reason: `inspect.getfile` returns absolute path instead of relative path to `os.getcwd` in newer python version. we sanitize this by removing the CI_PREFIX if applies See: https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278585 vs. https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60380 Test Plan: CI Plot twist: windows tests are actually launched via ``` pushd test python run_test.py ``` while linux/macos tests are ``` python test/run_test.py ``` This might cause problem when using `os.getcwd()` we will see from PR CI results. Reviewed By: malfet Differential Revision: D29276969 Pulled By: walterddr fbshipit-source-id: 336c2805d0c92733e0ff4c309ff2044dc2ed4e21	2021-06-21 20:47:23 -07:00
praneeth	9b30fb8528	add support for constant (#60166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58739 Add support for constants according to python array API stipulation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60166 Reviewed By: anjali411 Differential Revision: D29253958 Pulled By: mruberry fbshipit-source-id: 0bc86b74d3a4eb3ec4a65c941ec2710747402db1	2021-06-21 20:47:21 -07:00
Jeff Daily	1764aa79b9	restore JOB_BASE_NAME for test1 and test2 in test.sh (#60409 ) Summary: JOB_BASE_NAME for test1 and test2 were removed by https://github.com/pytorch/pytorch/issues/60124. This caused the ROCm CI to run all tests for both test1 and test2. Restore the use of JOB_BASE_NAME. Fixes https://github.com/pytorch/pytorch/issues/60377. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60409 Reviewed By: anjali411 Differential Revision: D29277560 Pulled By: walterddr fbshipit-source-id: ddf01466492a9a626ce1b6adf87cd102d8f1fe35	2021-06-21 20:46:17 -07:00
Philip Meier	7d39608a29	split TestAsserts by functionality (#58919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58919 Instead of having one large TestAsserts test case, we split of tests for self-contained functionality like container or complex checking into separate test cases. That makes it a lot easier to keep an overview over what is tested. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259407 Pulled By: mruberry fbshipit-source-id: 9769cb6d56c1a3790280542db398cb247986b09a	2021-06-21 20:44:23 -07:00
Philip Meier	14b0191d1f	make assert_equal an example how to partial `torch.testing.assert_close` (#58918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58918 ~Instead of a distinct `torch.testing.assert_close` and `torch.testing.assert_equal`, this makes `torch.testing.assert_equal` a special case of `torch.testing.assert_close` for `rtol=atol=0`. In this case the closeness definition `abs(actual - expected) <= atol + rtol * abs(expected)` boils down to `abs(actual - expected) <= 0`. Since `abs(x)` can never be `<0`, this is equivalent to `abs(a - b) == 0` and this again boils down to `a == b`.~ Following https://github.com/pytorch/pytorch/pull/58918#issuecomment-860642057 and some offline discussions, we opted to use `assert_equal` as an example how to `partial` it. This makes maintaing the module a lot easier, because we don't need to keep two functions in sync. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259404 Pulled By: mruberry fbshipit-source-id: fa1a1fa93672a7ed1c5f0e4beb0dcd45b5c14fce	2021-06-21 20:44:21 -07:00
Philip Meier	583f072778	introduce TestingErrorMeta for internal use (#58917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58917 In #54780 we opted to return `Optional[Exception]` from all internal helper functions. Since then multiple PRs added functionality that needs to amend the error message. For this we recreate the error `09a1b1cf87/torch/testing/_asserts.py (L417-L430)` To untangle this a little, this PR introduces the `_TestingErrorMeta`, which carries the exception type and the message. The idiom ```python exc = check_foo(): if exc: return exc ``` is still valid although `exc` should be renamed to `error_meta` to reflect the new nature. In the top-level functions `assert_(equal\|close)` ```python exc = check_foo(): if exc: raise exc ``` changes to ```python error_meta = check_foo(): if error_meta: raise error_meta.to_error() ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259405 Pulled By: mruberry fbshipit-source-id: 9078fe326283d5aa3d0cf256bf007887df9bfbfb	2021-06-21 20:44:20 -07:00
Philip Meier	cf789b9941	remove pytest.UsageError (#58916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58916 Using `pytest.UsageError` in case `pytest` is available adds almost nothing as observed in https://github.com/pytorch/pytorch/pull/53820#discussion_r593868752, but makes it harder to maintain: due to the conditional import, `mypy` is not able to handle `UsageError` in a type annotation. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259409 Pulled By: mruberry fbshipit-source-id: 82b00d13fa47db77383996d0caa69177804a48b6	2021-06-21 20:44:18 -07:00
Philip Meier	9fffd05e54	hide top-level test functions from pytest's traceback (#58915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58915 History: - It was included for internal helper functions in the initial proposal in #53820 - It was removed in #54780, since it is not honored when used with `pytest`'s `--tb=native`, which is the default for PyTorch Since PyTorch shouldn't be the only user of `assert_(equal\|close)` we add it here to the top-level functions `assert_(equal\|close)`. If `pytest` is used without `--tb=native`, the traceback for ```python assert torch.eq(actual, expected), "Tensors are not equal!" torch.testing.assert_equal(actual, expected) ``` looks the same, making it more concise. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259406 Pulled By: mruberry fbshipit-source-id: acee47b30b7f14def27433f7d56a4b19d77393c0	2021-06-21 20:44:16 -07:00
Philip Meier	18d45b960b	remove rouge raise in helper function (#58914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58914 Only the top-level functions `assert_(equal\|close)` should raise the exception to keep the traceback managable. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259408 Pulled By: mruberry fbshipit-source-id: 40dd52eec6f9e8166b3b239d5172ee44b749e8dc	2021-06-21 20:43:06 -07:00
Baichuan Yuan	dca97b4394	Weighted decay with frequency (count-based) (#60382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382 Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `wfreq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}\rho)*counter_i`. Test Plan: buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay Reviewed By: 0x10cxR1 Differential Revision: D25581030 fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446	2021-06-21 18:46:35 -07:00
Aliaksandr Ivanou	8f03018980	[pytorch] Move signal handler test to internal codebase (#60394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60394 Move signal handler test to internal codebase Github issue: https://github.com/pytorch/pytorch/issues/60260 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing:api_test buck test mode/dev-nosan //caffe2/torch/distributed/elastic/multiprocessing/fb/test:api_test Reviewed By: cbalioglu Differential Revision: D29273160 fbshipit-source-id: e4ae72f7f6d54cbba324119fce7446a30a6c37c9	2021-06-21 18:26:41 -07:00
jiayisun	af3f7a210a	add BFloat16 support for kthvalue and median on CPU (#60074 ) Summary: Add BFloat16 support for kthvalue and median on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/60074 Reviewed By: gchanan Differential Revision: D29230348 Pulled By: heitorschueroff fbshipit-source-id: fa9c086758d51069acf270faa526e4b141b0ef68	2021-06-21 17:52:18 -07:00
Lily Johnson	2606022d01	[package] fix for edge case `os` and `os.path` importing (#60276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60276 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29234143 Pulled By: Lilyjjo fbshipit-source-id: 4d96dde4ef1d84f9966f9f58c883ab9bb92fe728	2021-06-21 16:54:02 -07:00
Nicolas Weber	25e077bce1	[Issue 59296] added VE device (#59620 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59296 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59620 Reviewed By: zou3519 Differential Revision: D29196830 Pulled By: ezyang fbshipit-source-id: 7bb49f776dc755804a0ba0bc3a7dbdab9c93914e	2021-06-21 16:44:52 -07:00
Hariom Narang	9d1d799034	Added API to change logging levels for JIT (#58821 ) Summary: Description: - Before this, logging level could only be changed by changing the env variable "PYTORCH_JIT_LOG_LEVEL" - Can change the level from python now - Have not added stream configuration for now - Configuration is stored in a singleton class managing the options Issue Link: https://github.com/pytorch/pytorch/issues/54188 Gotchas: - Created separate functions `::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of using the singleton class's method directly - This is because when running test cases, two different instances of the singleton are created for the test suite and the actual code (`jit_log.cpp`) - On using these methods directly, `is_enabled` calls the singleton in `jit_log.cpp` while we are setting the config using another singleton - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times API: - To set the level: `torch._C._jit_set_logging_option("level")` - To get the level: `torch._C._jit_get_logging_option()` Testing: - UTs were added for C++ - A very simple UT was added for python to just check if the API is being called correctly - The API was checked by running trace in a sample python file - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination` - The error output had logs of form [DUMP..] [UPDATE...] etc Fixes https://github.com/pytorch/pytorch/issues/54188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821 Reviewed By: soulitzer Differential Revision: D29116712 Pulled By: ZolotukhinM fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f	2021-06-21 16:10:49 -07:00
Eli Uriegas	82a6574d89	cmake: Use BUILD_INTERFACE with TORCH_SRC_DIR (#60403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60403 TORCH_SRC_DIR has the potential to be hardcoded thus breaking downstream cmake extensions. Prefer CMAKE_CURRENT_SOURCE_DIR with BUILD_INTERFACE to make it magically work together See https://cmake.org/cmake/help/latest/command/target_include_directories.html Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29276503 Pulled By: seemethere fbshipit-source-id: 6ec0754de6a02cdc35a4a453d6271ac4fdfc5ee3	2021-06-21 15:37:27 -07:00
Pavithran Ramachandran	8dd1dc89cb	[PyTorch][Edge] Adding tests for lite quantized models (#60226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60226 # Context Read this posts for details about why we need a test bench for quantized lite modules https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/ # This Diff Adds test cases for Quantized Lite modules ghstack-source-id: 131859101 Test Plan: ``` [ ~/fbsource/fbcode] buck test caffe2/test:mobile -- mobile.test_lite_script_module.TestLiteScriptQuantizedModule Unable to connect to Buck daemon, restarting it... Running with tpx session id: 44cf0b2f-0905-444a-95df-4a2eec774163 Trace available for this run at /tmp/tpx-20210618-093849.343917/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874461151326 ✓ ListingSuccess: caffe2/test:mobile - main (16.736) ✓ Pass: caffe2/test:mobile - test_two_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (14.836) ✓ Pass: caffe2/test:mobile - test_annotated_nested (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (15.073) ✓ Pass: caffe2/test:mobile - test_quantization_example (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (16.286) ✓ Pass: caffe2/test:mobile - test_single_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (18.360) Summary Pass: 4 ListingSuccess: 1 ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874461151326/ Reviewed By: iseeyuan Differential Revision: D29212232 fbshipit-source-id: 8d0b61b3f414e31720f1e3ce681ec8fa716555c1	2021-06-21 15:09:42 -07:00
Rong Rong (AI Infra)	5bd49c3396	fix workflow id usage in GHA (#60376 ) Summary: This fixes: https://github.com/pytorch/pytorch/issues/60139 GHA workflow ID is set to `run_id` previously and it doesn't change across re-runs see: https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables Using GITHUB_RUN_NUMBER to report workflow ID instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60376 Test Plan: CI see: [with rerun](https://github.com/pytorch/pytorch/actions/runs/952508536) and [without rerun](https://github.com/pytorch/pytorch/actions/runs/955665324 ) example --> they reported everything under the same run ID but in fact the first one ran twice as many test cases reported in scuba. This shouldn't occur after this PR. Reviewed By: samestep Differential Revision: D29267455 Pulled By: walterddr fbshipit-source-id: 00fc6b75b84861e2f7d3e21698a5f840c3c21dcd	2021-06-21 14:54:49 -07:00
Edward Yang	1f50dc6e46	Fix ignoring Tensor properties in torch.overrides (#60050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60050 It doesn't work to put torch.Tensor.prop.__get__ in the ignored list. Now it does. (Not exercised here, see next diff in stack). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29171464 Pulled By: ezyang fbshipit-source-id: e7354668b481f9275f2eb5bb3a6228d1815fecea	2021-06-21 14:49:51 -07:00
Erjia Guan	65f33ec85c	Follow-up fix for compilation error on CUDA92 (#60287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60287 Follow up of #60017 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29236208 Pulled By: ejguan fbshipit-source-id: f1acf9630b45fea8cbdf7d64e47661643d0a52b8	2021-06-21 13:29:11 -07:00
kshitij12345	01e0296eb7	[special] migrate log1p, sinc, round to special namespace (#55878 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55878 Reviewed By: zou3519, janeyx99 Differential Revision: D29160593 Pulled By: mruberry fbshipit-source-id: f3ca9c541382bab33fb85d7817ce8ddc117c6826	2021-06-21 12:34:29 -07:00
Stephen Macke	769c299dcf	[caffe2] add tests for inplace elementwise ops (#60106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106 In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test ``` Let CI run. Reviewed By: clrfb Differential Revision: D29143189 fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23	2021-06-21 12:04:18 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	f66b53e8b2	Ignore unsupported attribute checker pass for torch.jit.trace (#60200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60200 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29207583 Pulled By: tugsbayasgalan fbshipit-source-id: 241620209dbafc94ebdb83d99257e341b11e999b	2021-06-21 11:55:12 -07:00
Simon Seo	b505adbb09	Fix typo in ChainDataset docs (#60336 ) Summary: * chainning -> chaining Pull Request resolved: https://github.com/pytorch/pytorch/pull/60336 Reviewed By: bdhirsh Differential Revision: D29265236 Pulled By: anjali411 fbshipit-source-id: 17a9b73af9e094550bd1ee25bc9439fb8d455e2b	2021-06-21 11:47:21 -07:00
Michael Wootton	2f3be2735f	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9	2021-06-21 11:46:08 -07:00
Jane Xu	eaa36ee679	Enable sharding for Windows GHA CI (#59970 ) Summary: Enables sharding for Windows on CI. To make that possible, we currently remove the smoke tests tested in shard 1 which don't seem all that important as they are 1. tested on nightlies 2. seems to be tested anyway by running the test suite Pull Request resolved: https://github.com/pytorch/pytorch/pull/59970 Reviewed By: seemethere Differential Revision: D29268484 Pulled By: janeyx99 fbshipit-source-id: 7f90d73037cfeb2c267b28714550316eb471b4dd	2021-06-21 11:42:22 -07:00
Sam Estep	023907a6fe	Allow Docker build on macOS (#60375 ) Summary: This PR allows developers using macOS to build Docker images locally. The `basename $(mktemp -u)` part was suggested by seemethere; I modified it slightly to appease ShellCheck and because [Docker doesn't allow uppercase characters in tags](https://stackoverflow.com/a/54291205). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60375 Test Plan: On a Mac: ``` cd .circleci/docker ./build.sh pytorch-linux-xenial-py3.6-gcc5.4 ``` Reviewed By: driazati Differential Revision: D29267025 Pulled By: samestep fbshipit-source-id: ba27d2fb108f573a50db069cf9ddea0414ed6074	2021-06-21 11:27:49 -07:00
David Riazati	27e34f731a	Re-enable clang-tidy on PRs (#60297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60297 This switches clang-tidy to the fresh tag from https://github.com/pytorch/test-infra/runs/2860763986 which has a fix for the missing OMP headers we were seeing. Along with #60225 this should restore clang-tidy to normal functionality and we shouldn't see any spurious warnings. Test Plan: Imported from OSS Reviewed By: seemethere, 1ntEgr8 Differential Revision: D29239783 Pulled By: driazati fbshipit-source-id: b1893256fdb27436af03d6c5279e81f64b47fe6b	2021-06-21 11:04:09 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Michael Carilli	f89ae9cb8d	Moves grid_sampler to autocast promote list (#58618 ) Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3	2021-06-21 10:22:36 -07:00
Raghavan Raman	61e0bc1955	[nnc] Remove check on initializer in compressBuffer (#60194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60194 Test Plan: Imported from OSS Reviewed By: bertmaher, huiguoo Differential Revision: D29206255 Pulled By: navahgar fbshipit-source-id: 0a68ec4067c37f06ca1ea9ddeeb5ad5e0dcb0639	2021-06-21 09:57:37 -07:00
CodemodService FBSourceClangFormatLinterBot	f2bb0932da	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29259226 fbshipit-source-id: 15fd79f6fed38d6ed2d84018852806683d5a09fa	2021-06-21 03:57:10 -07:00
Mike Ruberry	5ff407df67	Skips failing MacOS tests (#60348 ) Summary: Mitigates, but does not fix https://github.com/pytorch/pytorch/issues/60347. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60348 Reviewed By: ngimel Differential Revision: D29257917 Pulled By: mruberry fbshipit-source-id: de9be93ddeda1ca27ea2ff4650162f886d10f1e2	2021-06-21 01:35:36 -07:00
Sameer Deshmukh	1dee99c973	LU Solve using cublas and cusolver (#59148 ) Summary: This PR introduces cuSOLVER and cuBLAS for the `lu_solve` routine. Solves a part of https://github.com/pytorch/pytorch/issues/47953. Since usage of cuSOLVER with MAGMA introduces performance regressions in MAGMA (https://github.com/pytorch/pytorch/issues/56590), we use heuristics for determining when to call cuSOLVER, cuBLAS or MAGMA depending on the batch and matrix sizes. The 64-bit cuSOLVER API is not introduced in this PR since there are several problems with the LU factorization using cusolver (https://github.com/pytorch/pytorch/pull/59148). The following are performance benchmarks using various configurations: <details> ``` [--------------------------------------------------------- LU solve CUDA torch.float64 ----------------------------------------------------------] \| lu_solve CUSOLVER \| lu_solve MAGMA \| lu_solve CUBLAS \| lu_solve cuSOLVER/MAGMA \| lu_solve TEST ALL 1 threads: --------------------------------------------------------------------------------------------------------------------------------------- torch.Size([1, 1, 1]) \| 703.4 \| 489.8 \| 511.8 \| 710.1 \| 487.1 torch.Size([2, 1, 1]) \| 738.9 \| 504.1 \| 513.0 \| 958.2 \| 494.4 torch.Size([4, 1, 1]) \| 790.7 \| 514.7 \| 506.8 \| 983.9 \| 540.2 torch.Size([8, 1, 1]) \| 865.3 \| 496.4 \| 514.7 \| 975.2 \| 520.0 torch.Size([16, 1, 1]) \| 955.5 \| 483.9 \| 508.3 \| 937.6 \| 526.5 torch.Size([32, 1, 1]) \| 1167.7 \| 495.2 \| 511.2 \| 934.0 \| 528.7 torch.Size([64, 1, 1]) \| 1730.0 \| 492.1 \| 537.8 \| 936.4 \| 533.2 torch.Size([128, 1, 1]) \| 2748.4 \| 499.7 \| 526.5 \| 982.9 \| 540.8 torch.Size([1, 2, 2]) \| 724.6 \| 498.2 \| 541.7 \| 715.0 \| 504.7 torch.Size([2, 2, 2]) \| 737.0 \| 514.3 \| 527.6 \| 934.5 \| 524.5 torch.Size([4, 2, 2]) \| 750.5 \| 524.1 \| 537.4 \| 935.5 \| 543.0 torch.Size([8, 2, 2]) \| 844.8 \| 513.7 \| 538.9 \| 953.3 \| 534.4 torch.Size([16, 2, 2]) \| 1013.1 \| 521.9 \| 530.0 \| 932.2 \| 537.9 torch.Size([32, 2, 2]) \| 1335.8 \| 515.1 \| 544.4 \| 939.9 \| 559.5 torch.Size([64, 2, 2]) \| 1819.6 \| 511.8 \| 534.1 \| 973.9 \| 540.0 torch.Size([128, 2, 2]) \| 3018.7 \| 526.3 \| 546.1 \| 979.3 \| 543.5 torch.Size([1, 8, 8]) \| 732.5 \| 524.9 \| 532.9 \| 762.4 \| 516.8 torch.Size([2, 8, 8]) \| 771.2 \| 514.9 \| 538.7 \| 1007.5 \| 531.1 torch.Size([4, 8, 8]) \| 811.3 \| 507.7 \| 534.6 \| 1002.2 \| 548.5 torch.Size([8, 8, 8]) \| 866.6 \| 530.0 \| 532.0 \| 1016.1 \| 562.9 torch.Size([16, 8, 8]) \| 991.8 \| 533.6 \| 548.0 \| 1022.6 \| 548.5 torch.Size([32, 8, 8]) \| 1271.7 \| 541.2 \| 534.7 \| 1013.8 \| 545.6 torch.Size([64, 8, 8]) \| 1817.2 \| 530.2 \| 520.6 \| 1008.7 \| 566.3 torch.Size([128, 8, 8]) \| 2678.7 \| 531.6 \| 552.2 \| 1006.2 \| 555.0 torch.Size([1, 16, 16]) \| 738.2 \| 546.1 \| 536.6 \| 775.6 \| 540.1 torch.Size([2, 16, 16]) \| 782.6 \| 543.5 \| 539.6 \| 1010.9 \| 541.1 torch.Size([4, 16, 16]) \| 815.2 \| 546.1 \| 560.9 \| 1012.5 \| 553.1 torch.Size([8, 16, 16]) \| 877.7 \| 543.0 \| 547.9 \| 1012.8 \| 551.5 torch.Size([16, 16, 16]) \| 1008.7 \| 549.2 \| 562.7 \| 1016.6 \| 546.8 torch.Size([32, 16, 16]) \| 1291.9 \| 540.8 \| 560.3 \| 1055.8 \| 539.3 torch.Size([64, 16, 16]) \| 1846.3 \| 553.5 \| 556.0 \| 1010.8 \| 551.9 torch.Size([128, 16, 16]) \| 2953.8 \| 562.7 \| 547.5 \| 1026.2 \| 555.8 torch.Size([1, 32, 32]) \| 789.1 \| 590.6 \| 590.9 \| 790.5 \| 579.0 torch.Size([2, 32, 32]) \| 806.9 \| 596.6 \| 600.2 \| 1085.6 \| 573.8 torch.Size([4, 32, 32]) \| 852.0 \| 597.9 \| 588.2 \| 1098.9 \| 574.7 torch.Size([8, 32, 32]) \| 914.2 \| 597.8 \| 591.4 \| 1090.3 \| 585.7 torch.Size([16, 32, 32]) \| 1063.0 \| 604.6 \| 597.3 \| 1094.0 \| 580.5 torch.Size([32, 32, 32]) \| 1302.0 \| 602.0 \| 598.9 \| 1090.3 \| 583.6 torch.Size([64, 32, 32]) \| 1861.7 \| 601.1 \| 599.8 \| 1113.4 \| 588.6 torch.Size([128, 32, 32]) \| 3251.0 \| 619.6 \| 595.3 \| 1106.8 \| 608.9 torch.Size([1, 64, 64]) \| 978.6 \| 842.7 \| 778.6 \| 1071.4 \| 825.8 torch.Size([2, 64, 64]) \| 1072.3 \| 845.7 \| 785.4 \| 1400.6 \| 829.0 torch.Size([4, 64, 64]) \| 1051.9 \| 842.9 \| 796.1 \| 1352.2 \| 788.2 torch.Size([8, 64, 64]) \| 1090.3 \| 834.1 \| 805.2 \| 1382.6 \| 804.7 torch.Size([16, 64, 64]) \| 1206.9 \| 835.7 \| 802.2 \| 1365.6 \| 801.2 torch.Size([32, 64, 64]) \| 1671.2 \| 846.5 \| 794.5 \| 1345.1 \| 814.2 torch.Size([64, 64, 64]) \| 2759.3 \| 848.5 \| 795.4 \| 1409.7 \| 832.9 torch.Size([128, 64, 64]) \| 4928.6 \| 877.4 \| 848.3 \| 1439.0 \| 883.9 torch.Size([1, 128, 128]) \| 1315.6 \| 1158.4 \| 1130.0 \| 1301.3 \| 1177.1 torch.Size([2, 128, 128]) \| 1334.7 \| 1198.2 \| 1186.6 \| 1703.9 \| 1209.5 torch.Size([4, 128, 128]) \| 1374.6 \| 1200.7 \| 1266.2 \| 1640.6 \| 1272.3 torch.Size([8, 128, 128]) \| 1453.6 \| 1215.9 \| 1287.3 \| 1669.1 \| 1288.7 torch.Size([16, 128, 128]) \| 1882.1 \| 1244.9 \| 1337.6 \| 1698.8 \| 1347.1 torch.Size([32, 128, 128]) \| 2789.0 \| 1284.5 \| 1398.6 \| 1747.6 \| 1396.3 torch.Size([64, 128, 128]) \| 4763.0 \| 1425.2 \| 1581.7 \| 1921.0 \| 1584.1 torch.Size([128, 128, 128]) \| 8835.9 \| 1808.9 \| 1968.7 \| 2197.6 \| 1961.8 torch.Size([1, 512, 512]) \| 4369.9 \| 4577.6 \| 4804.0 \| 4331.4 \| 4599.0 torch.Size([2, 512, 512]) \| 4635.9 \| 4850.1 \| 5159.1 \| 5315.4 \| 4845.5 torch.Size([4, 512, 512]) \| 5367.5 \| 5261.6 \| 6134.7 \| 5807.8 \| 5345.2 torch.Size([8, 512, 512]) \| 7025.2 \| 6184.5 \| 7065.6 \| 6711.6 \| 6303.9 torch.Size([16, 512, 512]) \| 10221.3 \| 7849.7 \| 8820.1 \| 8323.6 \| 7992.1 torch.Size([32, 512, 512]) \| 16574.8 \| 11208.4 \| 12284.3 \| 11704.7 \| 11394.4 torch.Size([64, 512, 512]) \| 29500.1 \| 18043.1 \| 19249.3 \| 18744.0 \| 18242.1 torch.Size([128, 512, 512]) \| 56783.3 \| 33903.9 \| 34713.5 \| 33893.8 \| 34041.8 torch.Size([1, 1024, 1024]) \| 14864.5 \| 15714.6 \| 16128.1 \| 14726.7 \| 14992.6 torch.Size([2, 1024, 1024]) \| 17891.0 \| 18553.3 \| 19111.6 \| 19271.5 \| 19283.0 torch.Size([4, 1024, 1024]) \| 22143.4 \| 21909.2 \| 23667.1 \| 22698.9 \| 22713.8 torch.Size([8, 1024, 1024]) \| 30621.1 \| 28669.9 \| 30822.9 \| 29725.0 \| 29760.8 torch.Size([16, 1024, 1024]) \| 47045.9 \| 41900.0 \| 44353.8 \| 43215.6 \| 43237.5 torch.Size([32, 1024, 1024]) \| 79245.5 \| 68316.9 \| 70959.0 \| 69506.4 \| 69876.7 torch.Size([64, 1024, 1024]) \| 147973.9 \| 121120.6 \| 124601.1 \| 122084.4 \| 122578.7 torch.Size([128, 1024, 1024]) \| 295586.2 \| 232871.8 \| 237421.8 \| 233765.3 \| 234704.6 Times are in microseconds (us). ``` </details> Here's the details of how the tests were performed: * CUSOLVER - Only call `cusolver` for all problem sizes. * MAGMA - Only call `magma` for all problem sizes (this is the current master branch). * CUBLAS - Only call `cublas` for all problem sizes. * cuSOLVER / MAGMA - Use cusolver for `batch_size == 1` and magma for all others. * TEST ALL - Employ heuristics to switch between cublas/cusolver/magma. This yields the best overall results (this PR). Script for reproducing the results: <details> ``` python import torch import pickle import itertools from torch.utils.benchmark import Timer import sys shapes = [1, 2, 8, 16, 32, 64, 128, 512, 1024] batches = [(1,), (2,), (4,), (8,), (16,), (32,), (64,), (128,)] results = [] num_threads = 1 dtype = torch.float64 repeats = 2 from torch.testing._internal.common_utils import random_hermitian_pd_matrix def lu_factorize_solve(mat, b): lu_data = torch.lu(mat) x = torch.lu_solve(b, lu_data) for shape, batch in itertools.product(shapes, batches): mat = torch.randn(batch, shape, shape, dtype=dtype, device='cuda') b = torch.randn(batch, shape, 1, dtype=dtype, device='cuda') tasks = [("lu_factorize_solve(mat, b)", "lu_solve CUSOLVER")] print("shape: ", shape, " batch: ", batch) timers = [Timer(stmt=stmt, num_threads=num_threads, label=f"LU solve CUDA {dtype}", sub_label=f"{mat.shape}", description=label, globals=globals()) for stmt, label in tasks] for i, timer in enumerate(timers repeats): results.append( pickle.dumps(timer.blocked_autorange()) ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() f = open("cusolver_lu_solve.pickle", "wb") pickle.dump(results, f) f.close() ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59148 Reviewed By: H-Huang Differential Revision: D29160609 Pulled By: mruberry fbshipit-source-id: 7280f25db1e66aa650ea15608a6dc5d688fb4db2	2021-06-20 21:27:35 -07:00
Jerry Zhang	4a3eea9a6a	[quant][graphmode][fx] Produce reference linear module in convert (#60152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29188263 fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94	2021-06-20 20:08:12 -07:00
Rong Rong (AI Infra)	510334f34b	[BE] clean up IS_PYTORCH_CI and IN_CI (#60279 ) Summary: `IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279 Test Plan: CI Reviewed By: seemethere Differential Revision: D29239545 Pulled By: walterddr fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7	2021-06-20 19:45:07 -07:00
Jerry Zhang	2293ab4e53	[quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151 Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29188264 fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea	2021-06-20 19:41:16 -07:00
Ivan Yashchuk	a516424a70	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28627408 Pulled By: mruberry fbshipit-source-id: b95bbdf35f845a56a1489c04b53742a01b36e789	2021-06-20 19:37:12 -07:00
Jerry Zhang	47d727fe1b	[quant][graphmode][fx] Produce conv reference static quant modules (#60138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29184791 fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b	2021-06-20 19:25:45 -07:00
Masaki Kozuki	b298013cd5	[add/sub] Cast `alpha` to `acc_type` (#60227 ) Summary: This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`. I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it. Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits. ref: - `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: `6d0fb85a62/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu (L21-L28)`, `BinaryOpListAlphaFunctor` is defined here: `6d0fb85a62/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L202)` related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907 cc ngimel ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227 Reviewed By: mruberry Differential Revision: D29252759 Pulled By: ngimel fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21	2021-06-20 19:05:22 -07:00
Rohan Varma	0131a5972d	[DDP] Test inference works with eval() and no_grad() (#59666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59666 Tests that inference with DDP model won't hang when user sets eval() or no_grad(). Note that if the model has a syncBN layer, they need both eval() and no_grad() as eval() makes SyncBN work like a regular BN layer. ghstack-source-id: 131906625 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28974146 fbshipit-source-id: 137f8245b1c303beb2416518476e70fe67c73376	2021-06-20 12:02:43 -07:00
Jiakai Liu	69b2bf70f9	[pytorch] fix tools/code_analyzer for llvm 11 (#60322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60322 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29250420 Pulled By: ljk53 fbshipit-source-id: ff7f9cbacd1d9518ed81c06fc843a90d6948f760	2021-06-20 00:39:11 -07:00
Masaki Kozuki	c19acf816f	Replace TensorRT's deprecated API in `caffe2/python/trt/test_pt_onnx_trt.py` (#60236 ) Summary: TensorRT v8 is going to remove some functions/methods that used in test. ref: - getMaxWorkspaceSize deprecation: `b2d60b6e10/include/NvInfer.h (L6984-L6993)` - buildCudaEngine deprecation: `b2d60b6e10/include/NvInfer.h (L7079-L7087)` cc ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/60236 Reviewed By: gchanan Differential Revision: D29232376 Pulled By: ngimel fbshipit-source-id: 2b8a48787bf61c68a81568b6026d6afd5a83e751	2021-06-19 19:56:30 -07:00
kshitij12345	5ec4ad7f54	[special] Add special.ndtri (#58650 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 TODO * [x] Add docs https://13865352-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtri * [x] Add comments on implementation * [x] Clean-up Pull Request resolved: https://github.com/pytorch/pytorch/pull/58650 Reviewed By: H-Huang Differential Revision: D29160170 Pulled By: mruberry fbshipit-source-id: 50e4ea663920e97b8437d03d5b52bcd9dedc1a8d	2021-06-19 18:36:54 -07:00
Jiakai Liu	5824a866b7	[pytorch][nnc] support custom class parameters (#59466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59466 Change saved parameter type from at::Tensor to at::IValue to support custom class parameters, e.g. `__torch__.torch.classes.xnnpack.Conv2dOpContext`. The NNC produced kernels won't deal with custom class parameters directly. They simply pass through to the external operators that take these custom class parameters, e.g. `prepacked::conv2d_clamp_run`. It will reuse the `__getstate__` and `__setstate__` methods on the custom class to persist and restore the state of the parameters. When calling into the kernel, it will pass in the untyped raw pointer of the custom class objects to the kernel as `void*`. It's similar to the regular tensor parameters, for which it will pass in the raw data pointer of the tensor storage. The generated kernel needs to hardcode the expected type for each parameter and cast before calling the external ops. ghstack-source-id: 131897904 Test Plan: - unit tests Reviewed By: kimishpatel Differential Revision: D28902496 fbshipit-source-id: 4b2c0895dd28f0b7d344aa08183d42ad6a355dae	2021-06-19 06:11:01 -07:00
Tao Xu	cac9ae1506	[iOS GPU][BE][3/n] Give MPSImage objects a label for better debugging experience (#60282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60282 1. Adds a label to the MPSImage objects. The label describes the size of the image. 2. Remove `[image markRead]`. 3. Rename two APIs for better naming convention. ghstack-source-id: 131839557 Test Plan: 1. CircleCI 2. buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29232975 fbshipit-source-id: 075175c4b5a1c5b79e795f4860e1694d7c06d4f2	2021-06-18 18:47:05 -07:00
Tao Xu	b9cd97c94b	[iOS GPU][BE][2/n] Remove unused APIs (#60281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60281 1. REmove unused APIs from MPSImageUtils. 2. Move tensor related APIs from MetalUtils to MetalTensorUtils. Delete MetalUtils.h/mm 3. Move metal buffer related APIs to MetalContext ghstack-source-id: 131839559 Test Plan: 1. CircleCI 2. buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29232973 fbshipit-source-id: a4c0c848883b8ef615eeb2936c1f3d18cddcb318	2021-06-18 18:47:04 -07:00
Tao Xu	80e6e3f1da	[iOS GPU][BE][1/n] Rename MPSCNNContext to MetalContext (#60280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60280 No significant changes besides renaming the class. In the future, we'll convert this objc class to c++. ghstack-source-id: 131827490 Test Plan: - CircleCI - buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29231824 fbshipit-source-id: a0d1327a55a0414011c78a7144d3b05f1579cf42	2021-06-18 18:45:24 -07:00
Pritam Damania	319890b1b2	Support args in Pipe.forward API. (#55441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55441 This is the first step towards supporting the proposal outlined in https://github.com/pytorch/pytorch/issues/53952. In this PR I've ensured Pipe.forward() accepts a inputs argument instead of just a single input as previously. This lays the groundwork for supporting non-Tensors and generic arguments to the Pipe API. In this PR we still only support Tensors and non-Tensor support will come in future PRs. For backward compatibility I've ensured a single Tuple[Tensor] input still works as expected previously. ghstack-source-id: 130767499 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27613887 fbshipit-source-id: 05e19e537e6d7fe4999745fc4ba9941ac54906de	2021-06-18 17:53:32 -07:00
Pritam Damania	a8430f1076	Remove PlacementSpec from ShardingSpecs. (#59990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59990 ShardingSpecs accepted a Device/PlacementSpec and was initially written this way for flexibility. Although, it is slightly confusing given there is no general use case for this. As a result, to keep things simple I've ensured that both specs only accept devices for now. We can always extend this to include a general PlacementSpec later on. ghstack-source-id: 131842525 Test Plan: waitforbuildbot Reviewed By: SciPioneer, rohan-varma Differential Revision: D29116463 fbshipit-source-id: a6f2b3f1346ac6afab91c9595d4cae4f4da04fda	2021-06-18 17:37:43 -07:00
Thomas J. Fan	1c97c3e3a4	DOC Adds LSTM docs for defined variables when bidirectional=True (#60120 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60120 Reviewed By: gchanan Differential Revision: D29240245 Pulled By: jbschlosser fbshipit-source-id: acad9c24f41f7253a7d42cd940e54bb66e083ecf	2021-06-18 17:28:44 -07:00
Joel Schlosser	aae2a3c95e	Clarify ConvTransposeNd + reference links (#60291 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60291 Reviewed By: gchanan Differential Revision: D29239199 Pulled By: jbschlosser fbshipit-source-id: 9b2de1a8b1a7444797f82c73195c5efc929562eb	2021-06-18 17:18:11 -07:00
Peter Bell	e8e3394ea8	Recognize transposed dense tensors as a form of partial overlap (#59014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59014 Fixes #48401 `assert_no_overlap` currently has a false-negative where it recognizes the transpose of a contiguous tensor as fully overlapping. This happens because the memory regions do fully overlap, but of course the strides are different so the actual elements don't all overlap. This goes slightly in the other direction, by requiring strides to exactly match we get false-positives for some unusual situations, e.g. ``` torch.add(a, a, out=a.view([1, *a.shape])) ``` Or replacing strides of length-1 dimensions, etc. However, I think these are sufficiently obscure that it's okay to error and the common cases like inplace operations still work as before. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29040928 Pulled By: ngimel fbshipit-source-id: 5a636c67536a3809c83f0d3117d2fdf49c0a45e6	2021-06-18 16:29:25 -07:00
Raghavan Raman	47bbc01e0b	[nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955317 Pulled By: navahgar fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8	2021-06-18 14:32:09 -07:00
Raghavan Raman	d0c4ace00f	[jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955318 Pulled By: navahgar fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c	2021-06-18 14:32:07 -07:00
Raghavan Raman	d4c626a346	[jit] Exported a method to get the supported list of elementwise ops (#60162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60162 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29190841 Pulled By: navahgar fbshipit-source-id: bb786a653441c5b586509e25cc80d357d2223af3	2021-06-18 14:32:05 -07:00
Raghavan Raman	55755edc60	[jit] Made a list for element-wise ops. (#59579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59579 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D28955319 Pulled By: navahgar fbshipit-source-id: 605531aedf9250a226b0401d55fda3427bdc6f33	2021-06-18 14:30:47 -07:00
Jerry Zhang	a029422cae	[quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054 Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype, this causes a problem for the following case: ``` class M(torch.nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(1, 1, 1) def forward(self, x): x = self.conv(x) x1 = x.expand_as(x) x2 = torch.add(x, x1) return x2 def forward(self, x): x = self.activation_post_process_0(x) x = self.conv(x) x = self.activation_post_process_1(x) x1 = x.expand_as(x) x1 = self.activation_post_process_2(x1) x2 = torch.add(x, x1) x2 = self.activation_post_process_3(x2) return x2 def forward(self, x): x = torch.quantize_per_tensor(x, ...) x = self.conv(x). # quantized conv x = torch.dequantize(x) x1 = x.expand_as(x) x1 = torch.quantize_per_tensor(x1, ...) # Error: x is dequantized x2 = torch.ops.quantized.add(x, x1) return x2 Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output: quantized_conv - dequantize - expand_as \ ------- quantized_add But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well: env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}} And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node. ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29149408 fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892	2021-06-18 13:31:43 -07:00
Rong Rong (AI Infra)	c0f8cad0f0	Be fix shard inbalance (#60206 ) Summary: First step to address https://github.com/pytorch/pytorch/issues/60136 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60206 Reviewed By: janeyx99 Differential Revision: D29215237 Pulled By: walterddr fbshipit-source-id: ec25beb57366ef2eaf37878cdea391b245de9bef	2021-06-18 12:49:30 -07:00
Mikhail Zolotukhin	d9e7df707b	[TensorExpr] Add NNC lowerings for `aten::mean`, `aten::addmm`, and `aten::adaptive_avg_pool2d`. (#59347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347 We had external call wrappers for them, but they were not used in NNC. This PR adds lowerings using these ext calls and fixes some bugs in them. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28853832 Pulled By: ZolotukhinM fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41	2021-06-18 11:56:32 -07:00
Mikhail Zolotukhin	c6bb9409b8	[TensorExpr] Handle not-specified dtypes and strides. (#59346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346 Currently JIT has a pass to propagate shapes, but doesn't have a capability to fill in strides and dtypes. This PR works around that by assuming default dtype to be Float and strides corresponding to contiguous layout, unless otherwise specified. Ideally, we won't need this, and this is done simply as a workaround unless the corresponding features are implemented on JIT side. This is required for AOT compilation of mobilenet v3 with NNC. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28853831 Pulled By: ZolotukhinM fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496	2021-06-18 11:56:30 -07:00
Mikhail Zolotukhin	f042455a8d	[JIT] ShapeProp: add missing ops from mobilenet v3. (#59163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59163 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28853833 Pulled By: ZolotukhinM fbshipit-source-id: 451fb9ee848968049d26fb5623a904d8fa7bd6fc	2021-06-18 11:55:00 -07:00
Eddie Yan	3870e68644	TF32 threshold twiddling for tests (#60209 ) Summary: Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100. CC Flamefire ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209 Reviewed By: gchanan Differential Revision: D29220508 Pulled By: ngimel fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311	2021-06-18 11:41:33 -07:00
Zhengxu Chen	5f010c066f	[package] Bring back save_source_file (#59962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59962 This reverts commit 44b021d21b5681c105529881bdbaefb6d3e335f6. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D29113224 Pulled By: zhxchen17 fbshipit-source-id: 55d42acc421c5f4abbbad9d9ed4d32b615939463	2021-06-18 11:13:35 -07:00
Vasiliy Kuznetsov	5a45103139	ns for fx: add API usage logging (#60103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103 Adds internal logging for NS for FX API usage. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D29166710 fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a	2021-06-18 10:25:59 -07:00
Ansha Yu	0baad214b0	[static runtime][fix] resize to the input tensor size for full_like (#60229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229 Fix bug where we did not resize to the input tensor size, causing the output to be incorrect Test Plan: Test on replayer, rebased on D29217781, with model 278203319_26. Verify with jit outputs (D28583950) `./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1` Reviewed By: hlu1, movefast1990 Differential Revision: D29218918 fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409	2021-06-18 09:56:25 -07:00
Rohan Varma	d5df274ea5	[DDP] Support for multiple backwards (#59359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59359 Move `prepare_for_backward` into `_DDPSink` backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with `retain_graph=True`. ghstack-source-id: 131774159 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28855226 fbshipit-source-id: 6b7b25d75b7696f5b5629078233433f97663d61c	2021-06-18 09:23:57 -07:00
Sam Estep	3815a013ed	Enable xenial-cuda11.1-cudnn8-py3.6-gcc7 in GHA (#60196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60196 Test Plan: https://github.com/pytorch/pytorch/issues/60198: https://github.com/pytorch/pytorch/actions/runs/947796763 I should have used `ghstack` but I forgot; will do that in the future. Reviewed By: walterddr Differential Revision: D29231161 Pulled By: samestep fbshipit-source-id: 8299a248ca9c1d36c3845d1c8a10ca9bf7101124	2021-06-18 09:18:53 -07:00
Philip Meier	d5988c5eca	remove unused `type: ignore` directives (#60006 ) Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a	2021-06-18 07:23:31 -07:00
TJ-coding	7c29ca7f2b	Fix Subset of a Subset not sliceable issue (#59513 ) Summary: Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset. Fixed the issue by changing the indices to a Tensor which can be indexed by a list. Fixes https://github.com/pytorch/pytorch/issues/59512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513 Reviewed By: zou3519 Differential Revision: D29196891 Pulled By: ejguan fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9	2021-06-18 07:07:34 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Luca Wehrstedt	958b881d70	[reland] Add some TORCH_API annotations to RPC (#60169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60169 Reland of #59939. ghstack-source-id: 131706861 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193233 fbshipit-source-id: 91d3ef9003b9da7b99e1b9310b7f5a6c505d3b99	2021-06-18 05:15:07 -07:00
Luca Wehrstedt	83fde5d981	[reland] Pass RequestCallback to FaultyPG RPC agent (#60168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60168 Reland of #59939. ghstack-source-id: 131706860 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193235 fbshipit-source-id: 170108956a041f6a91b2b21c76ab1a0e0cdd34a2	2021-06-18 05:13:57 -07:00
albanD	8a839c5478	Fix saved variable unpacking version counter (#60195 ) Summary: We only set the value and not the actual VC. This means that in the context of double backward, if that saved tensor is saved again and the original Tensor is modified inplace, we would not detect it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60195 Reviewed By: Varal7 Differential Revision: D29208766 Pulled By: albanD fbshipit-source-id: 81175f8e3f111f89524f8e46f47577b2ea4fc945	2021-06-18 04:36:46 -07:00
Mike Ruberry	5609c2e59c	Adds an OpInfo note (#57428 ) Summary: Like the title says. The OpInfo pattern can be confusing when first encountered, so this note links the Developer Wiki and tracking issue, plus elaborates on the goals and structure of the OpInfo pattern. cc imaginary-person, who I can't add as a reviewer, unfortunately Pull Request resolved: https://github.com/pytorch/pytorch/pull/57428 Reviewed By: SplitInfinity Differential Revision: D29221874 Pulled By: mruberry fbshipit-source-id: aa73228748c9c96eadf2b2397a8b2ec31383971e	2021-06-18 03:40:42 -07:00
driazati	ecc37184a5	Fix clang-tidy path filtering (#60225 ) Summary: PR https://github.com/pytorch/pytorch/issues/60048 neglected to include the `--paths` option for file filtering, so it ended up passing every changed file in the diff to clang-tidy (cpp files outside `torch/csrc/`, yaml/sh files, etc.). This adds that back in to make the filtering work properly again. Tested it manually by printing out the files to lint and running ```bash curl -L https://github.com/pytorch/pytorch/pull/60018.diff > diff python tools/clang_tidy.py --diff-file diff --paths torch/csrc/ curl -L https://github.com/pytorch/pytorch/pull/60222.diff > diff python tools/clang_tidy.py --diff-file diff --paths torch/csrc/ ``` Should fix https://github.com/pytorch/pytorch/issues/60192 and fix https://github.com/pytorch/pytorch/issues/60193, the files tripping errors there shouldn't have been passed to clang-tidy in the first place (supporting aten/ for clang-tidy is a separate task) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60225 Reviewed By: zhouzhuojie Differential Revision: D29216251 Pulled By: driazati fbshipit-source-id: b5d7fb7161d33eb7958a6f1ccc25809942045209	2021-06-17 23:03:59 -07:00
Ruilin Chen	38c3116813	[hierarchical sharding 5/n] enable table-wise -> col-wise sharding in embedding table lookup Summary: This diff add table-wise -> col-wise sharding support in GroupedShardedEmbeddingBag. Changes includes: 1. Add necessary member variables set up. 2. Create new fast kernel and add fast kernel lookup support 3. Add intra-host all2all and cross-host all2all logic. Test Plan: UT ``` buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_spawn ``` ``` buck test caffe2/torch/fb/hpc/tests:model_sharder_test ``` QPS check: ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 10000 --num-dpp-worker-threads 16 --num-readers 100 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "["table_based", "column_based"]" --flow-entitlement ads_global_qps ``` with diff: dec inline_cvr: table-wise -> table-wise (82K): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_d0a0cba5?version=0&tab=status&env=PRODUCTION table-wise -> column-wise (80k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_b1ac5873 column-wise: dec inline_cvr: gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623827677%2F127.0.0.1%2Flibkineto_activities_4550.json.gz&bucket=gpu_traces https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_a79e1522 (81k) https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_2dacc13e (88k) row-wise(62k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_4e349cab table-wise(90k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_5d51b608 10x ctr_mbl_feed: ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 128 --use-shrunk-model false --model-version=ctr_mbl_oct_2020_10x_3tb --num-dpp-worker-threads 16 --num-readers 200 --fast-kernel table_batched --max-batches 5000000 --hpc-identity ads_model_platform --table-partition column_based --flow-entitlement ads_global_tc_mimo ``` column-wise: https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_f05fb306?version=0&tab=status&env=PRODUCTION (290k) w/o diff: dec inline_cvr: column-wise (87K): gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623864444%2F127.0.0.1%2Flibkineto_activities_4451.json.gz&bucket=gpu_traces https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_e1315f14 row-wise (60k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_8fcc0adf table-wise (91k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_cb94ff41 10x ctr_mbl_feed: https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_203ef35b?version=0&tab=status&env=PRODUCTION (281k) NE check(use deterministic reading D28711400) ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 100000 --num-dpp-worker-threads 16 --num-readers 64 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "[table_based, column_based]" --flow-entitlement ads_global_qps --use-deterministic-model --use-deterministic-reading --model-entity-id 995557193 ``` w/o this diff: ``` I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|lifetime_ne 0.8660048340401448 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|window_ne 0.8660048340401447 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|total_examples 1867776.0 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|window_qps 491.5199890136719 ``` https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION w this diff: ``` I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|lifetime_ne 0.8660048340401448 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|window_ne 0.8660048340401447 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|total_examples 1867776.0 ``` https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION Reviewed By: JadeNie Differential Revision: D28689126 fbshipit-source-id: 1c7879d4e3ee2b90aaf2a89e87f7b827d54173b3	2021-06-17 22:25:25 -07:00
Patrick Wang	8b55e9feaf	removed cat, equal, and stack from autocast promote list (#59497 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497 Reviewed By: zou3519 Differential Revision: D29185909 Pulled By: ngimel fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c	2021-06-17 21:13:22 -07:00
Gisle Dankel	faf459f13e	[Profiler] Fix memory profiler merge issue (#60037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60037 The memory profiler was broken due to a mis-merge during rebase. Add lost line back. Reviewed By: ezyang Differential Revision: D29143469 fbshipit-source-id: c3bf0088ca12e7535eeddbede24e28201eccd5f4	2021-06-17 21:05:23 -07:00
Patrick	bcf8752fb2	updated launch bounds for trilinear 3d (#59999 ) Summary: Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck). Timing data: (Using Nvidia Titan-V GPU) ![TrilinearTimingData](https://user-images.githubusercontent.com/22803332/121979658-72f19200-cd3f-11eb-9363-c00e2c4eea6d.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59999 Reviewed By: zou3519 Differential Revision: D29185976 Pulled By: ngimel fbshipit-source-id: 0b2313e70e45c53938cd7262464d3aa4fab8da4a	2021-06-17 21:02:12 -07:00
Thomas J. Fan	7e032f18cf	DOC Describes behavior for None in module.register_* (#60125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60125 Reviewed By: zou3519 Differential Revision: D29196138 Pulled By: jbschlosser fbshipit-source-id: af736c0d66005ec33412860f00b233a5d2922137	2021-06-17 19:18:23 -07:00
Eli Uriegas	047925dac1	.github: Run Windows CUDA build on pull requests (#60215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60215 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29214519 Pulled By: seemethere fbshipit-source-id: 58df5ee49cc5cd46f48938f023f87a6da958f3b6	2021-06-17 16:30:31 -07:00
Serhat Yilmaz	6af5d00e4b	[torch][segment_reduce] Add support for multi-dimensional input (cuda) (#60018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60018 Same as title. This diff finishes cuda support for currently implemented reductions and input parameters. Next Steps: - Add support for sum/min - More testing and benchmarking - Cleanup - Update default values when length is 0 - Use TensorIterator - Update documentation Test Plan: Unit test to cover cuda forward path. Reviewed By: ngimel Differential Revision: D29135373 fbshipit-source-id: d070727eeb660f56782e7ac8a5b0798be688480a	2021-06-17 16:30:30 -07:00
Serhat Yilmaz	a727f655c8	[torch][segment_reduce] Support for multi dimension (cpu only) (#59951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59951 Add support for multi-d input for cpu forward/backward implementation. Next step: Adding cuda support for multi-d input. Test Plan: Added unit tests. Reviewed By: ngimel Differential Revision: D29105457 fbshipit-source-id: a389ba4cc10f02434a336b8e7d36259f32552e11	2021-06-17 16:29:14 -07:00
Eli Uriegas	8e67981995	.github: Disable clang-tidy for now (#60219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60219 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29214928 Pulled By: seemethere fbshipit-source-id: 20cf38ebfe77ed646e25293c577937c56bd930d3	2021-06-17 16:26:31 -07:00
Zhuojie Zhou	acf04cdedf	Fix default DEFAULT_FILE_PATTERN in clang-tidy (#60212 ) Summary: Without the change, clang-tidy also checks folders like `.circleci/...` Example of the clang-tidy that looked into `.circleci` changes https://github.com/pytorch/pytorch/runs/2844682644?check_suite_focus=true [skip ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/60212 Reviewed By: seemethere Differential Revision: D29214728 Pulled By: zhouzhuojie fbshipit-source-id: fd53f7b2f7d88936264db1effdc06cc4fc271ca4	2021-06-17 16:25:18 -07:00
zhouzhuojie	9c03de1dde	Use mirrors for ubuntu apt source (#60216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60135 Experimented on circleci https://app.circleci.com/pipelines/github/zhouzhuojie/gha-ci-playground/7/workflows/965c95b8-2186-434a-92ca-9cd9c8aaafdc/jobs/7 Sample logs ``` Need to get 1,389 kB of archives. After this operation, 5,495 kB of additional disk space will be used. Get:1 http://mirrors.ubuntu.com/mirrors.txt Mirrorlist [3,270 B] Get:2 http://mirror.lstn.net/ubuntu focal/main amd64 libtcl8.6 amd64 8.6.10+dfsg-1 [902 kB] Get:7 http://ubuntu.securedservers.com focal/main amd64 libipc-run-perl all 20180523.0-2 [89.7 kB] Get:5 http://mirrors.edge.kernel.org/ubuntu focal/universe amd64 expect amd64 5.45.4-2build1 [137 kB] Get:4 http://mirror.pnl.gov/ubuntu focal/universe amd64 tcl-expect amd64 5.45.4-2build1 [105 kB] Get:6 http://mirror.lstn.net/ubuntu focal/main amd64 libio-pty-perl amd64 1:1.12-1 [32.4 kB] Get:9 https://mirrors.bloomu.edu/ubuntu focal/main amd64 libtimedate-perl all 2.3200-1 [34.0 kB] Get:8 http://la-mirrors.evowise.com/ubuntu focal/universe amd64 libtime-duration-perl all 1.21-1 [13.1 kB] Get:3 http://mirrors.ocf.berkeley.edu/ubuntu focal/main amd64 tcl8.6 amd64 8.6.10+dfsg-1 [14.8 kB] Get:10 http://mirrors.ocf.berkeley.edu/ubuntu focal/universe amd64 moreutils amd64 0.63-1 [60.5 kB] Fetched 1,392 kB in 3s (464 kB/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60216 Reviewed By: seemethere Differential Revision: D29214661 Pulled By: zhouzhuojie fbshipit-source-id: ed2d85f8c0c23af4bcf33558c57472fcf9d913e8	2021-06-17 16:19:27 -07:00
BowenBao	3995fb1840	Add new_ones symbolic (#59255 ) (#59539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59539 Add new_ones symbolic in PT-ONNX exporter Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29046603 Pulled By: SplitInfinity fbshipit-source-id: e7420c7b543c33e3640e62461d08ff4d5843eda7 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-06-17 15:49:24 -07:00
Stephen Jia	ef1c107be5	[vulkan] Do not use memcmp to compare structs (#60199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60199 It isn't safe to use `memcmp` to determine the equality of structs due to potential random padding between fields of the struct. This can cause overloaded equality operators to return false when comparing structs with equivalent fields. This bug appears to be responsible for the Vulkan backend crashing on WorkVC release builds. Test Plan: Run Vulkan unit tests: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Test on workvc rdk build, first ensure you are receiving the Vulkan models. ``` buck install fbsource//fbandroid/mode/opt fbsource//fbandroid/mode/aloha_build_rdk fbsource//fbandroid/mode/no_obfuscation fbandroid/buck-configs/buckconfig.caffe2_pkg_snpe_libs_android aloha_workvc_rdk --deep --show-full-output ``` Reviewed By: IvanKobzarev Differential Revision: D29203177 fbshipit-source-id: e0ee79d4e635174e165b250f2cee842a09092df9	2021-06-17 15:20:30 -07:00
Brian Hirsh	6d0fb85a62	Revert D28833086: beef up at::_ops API Test Plan: revert-hammer Differential Revision: D28833086 (`e2129d1c06`) Original commit changeset: 55f322a8378c fbshipit-source-id: e55bf812ec411bb6bee87654f1d65ff10c046106	2021-06-17 14:28:32 -07:00
Rohan Varma	0cbb5e15d7	Correct backend in pipe_with_ddp_test (#60123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60123 All of the tests would run with gloo, but some tests specify a different backend param which we should respect. ghstack-source-id: 131688188 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29171549 fbshipit-source-id: 3e306060df189c0e38d5ca6dd34f4b4fbca052b9	2021-06-17 13:43:01 -07:00
Rohan Varma	acd914f039	Fix Pipe + DDP for unused parameters, static graph (#60118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60118 Pipe + DDP has a few issues: 1) with static graph, does not synchronize gradients on first backward pass (i.e. delay allreduce is not run). does not work since https://github.com/pytorch/pytorch/pull/55248 2) when find_unused_parameters=True, also does not results in gradient synchronization. does not work since https://github.com/pytorch/pytorch/pull/57081 The reason for both cases is that calling `DDPSink.apply(output_tensor)` does not call the custom `backward` of `DDPSink` when the `output_tensor` is actually an `OwnerRRef`, which is the case when running DDP in `Pipe`. This is because we do `backward` on the `rref.local_value()` which does not have this autograd recording. To fix, we unwrap the RRef and reconstruct it as needed, similar to the fix in https://github.com/pytorch/pytorch/pull/49908. to test: All tests in pipe_with_ddp_test pass. The reason these tests did not catch the errors earlier is because all ranks received the same model inputs. So if gradient synchronization did not occur, then grads would still be the same because the model is the same on all ranks (guaranteed by ddp). Fixed the tests to use different inputs across ranks. ghstack-source-id: 131688187 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29167283 fbshipit-source-id: fe62310db2dc6de8519eb361b1df8ae4dfce3ab8	2021-06-17 13:41:51 -07:00
Tao Xu	2062cafaa5	[iOS GPU][MaskRCNN] Implement RoIAlign in Metal shaders using Sampler (#56075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56075 Inspired by the CUDA implementation - https://fburl.com/diffusion/e90tabkj. The main difference is the way we implement bilinear interpolation. CUDA does this manually by iterating every point in each bin box. Whereas, Metal does this by calling sampler's sample function, which is a bit easier and faster. The result is almost identical to the result from CPU - P365102522. We'll do another round of refactor once we have figured out how to support custom ops on GPU. ghstack-source-id: 131720620 Test Plan: 1. Circle CI 2. Sandcastle Reviewed By: ajtulloch Differential Revision: D27485068 fbshipit-source-id: 31e831aead9d3799a3fde96e99dd677d96bd3da1	2021-06-17 13:29:42 -07:00
Brian Hirsh	e2129d1c06	beef up at::_ops API (#59115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59115 This PR beefs up the `at::_ops::` API as a source of truth for compile-time information about each operator. ### Changes For every op defined in native_functions.yaml, e.g. `at::_ops::add_Tensor` previously defined an unambiguous function; effectively an unambiguously named version of the C++ API that you could decltype() successfully because it had no overloads with a user-facing macro: `decltype(ATEN_FN2(add, Tensor)) // expands to decltype(at::_ops::add_Tensor)`. Now, `at::_ops::add_Tensor` is a struct containing a few static fields and methods (declared in `Operators.h`, defined in `Operators.cpp`): ``` struct TORCH_API add_Tensor { using schema = at::Tensor (const at::Tensor &, const at::Tensor &, const at::Scalar &); using ptr_schema = at::Tensor ()(const at::Tensor &, const at::Tensor &, const at::Scalar &); static constexpr const char name = "aten::add"; static constexpr const char* overload_name = "Tensor"; static constexpr const char* schema_str = "add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> Tensor"; static at::Tensor call(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha); static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, const at::Tensor & ot }; ``` What used to be the function `at::_ops::add_Tensor` can now be accessed as `at::_ops::add_Tensor::call`, and I've added a new macro to access the entire struct (naming suggestions welcome) - `ATEN_OP2(add, Tensor)`. ### Motivation There were two motivations for this change: Codegen refactor* The `at::_ops::` API as it exists now is (yet another) C++ entry point into the dispatcher, in addition to the Function, Method, and Redispatch APIs. Instead, after this PR, the existing three API's are all inline-able wrapper API's that call into the `at::_ops` API to do the real work. The function and method API's call into `at::_ops::{op}::call`, while the redispatch API calls into `at::_ops::{op}::redispatch`. This will hopefully make it easier to pile in any future C++ API's that we want to code-generate. It also means that stuff like the string name, overload name, and schema of each operator is consolidated in a single place, rather than having the codegen hardcode various strings in multiple codegen output files. Extra compile-time metadata In the [boxed CPU fallback PR](https://github.com/pytorch/pytorch/pull/58065/files#diff-c9b55f0d692a9bea8019c6f19bc46877f1efa0f9d4fc2086cf299b52768343b4R31) above this in the stack, I added a new API that external backends can use to call directly into their boxed fallback from an unboxed context. Adding extra metadata to `at::_ops` means that XLA's usage of that API doesn't require passing in the string name and overload of each name as arguments; we can just infer them. The updated API looks like this (see [the XLA-side PR ](https://github.com/pytorch/xla/pull/2945/files#diff-5e65c3c1d847191cb691d1874732e971f09fa1aad7a980a555c3b0504a5b6470R250) for more examples) ``` return at::native::call_fallback_fn<&xla_cpu_fallback, ATEN_OP2(add, Tensor)>::call(a, b, 1.0); ``` Characteristics of the `at::_ops` API (I also commented this in the codegen) (1) It follows the Dispatcher API. This means, e.g., that it takes in the expanded arguments rather than `TensorOptions`. This is kind of necessary for perf, if we want to `at::_ops` to serve as the main implementation of the existing C++ API's. For example: if it followed the C++ API, then all of the faithful C++ factory functions would need to wrap their arguments into TensorOptions only to unwrap them again. (2) Overload names are disambiguated. This is the same as before; it's helpful for pytorch extenders who would like to decltype() an aten operator, that has overloads, e.g. decltype(at::_ops::mul_Tensor::call) (3) No argument defaulting is allowed. This is more of an implementation detail to avoid #include cycles, since TensorBody.h (which defines the Tensor class) needs to include this file. The #include situation is precarious though! (4) manual_cpp_bindings and faithful names are not included in the API. I think that this is one we have a choice with. This applies to stuff like __dispatch__is_complex(), and add_outf(). These aren't "real native_functions.yaml ops", they're just additional functions provided by the C++ API. They're implemented as wrappers in Functions.h that call into the actual operators defined here, i.e. at::_ops::is_complex::call() and at::_ops::add_out::call(). This means that ATEN_OP(is_complex) will not fastpath, and will go through the dispatcher. It also means that `ATEN_OP2(add, out)` is automatically faithful and takes its out argument at the end (this is just because it follows the dispatcher API). Details Instead of codegen'ing the existing 3 API's in `Functions.cpp`, `TensorMethods.cpp` and `RedispatchFunctions.cpp`, I codegen them directly into the headers: `Functions.h`, `TensorBody.h`, and `RedispatchFunctions.h`. I mostly did this for perf, since we want to avoid introducing an extra function call in the hot path of every operator. These functions are also now all one-liners that call into `at::_ops`, so the compiler should just inline them all anyway. The main downside in doing that though was that I had to bend over backwards in a few cases to avoid cyclical #include statements. The issue is that `TensorBody.h` now includes `Operators.h` (because the codegen'd method API is implemented by calling into `at::_ops`), but `TensorBody.h` also includes the definition of the Tensor class. That means that `Operators.h` can't be aware of the Tensor class; it needs to forward declare everything and avoid using the Tensor class directly. To fix cyclic includes, I had to: - Not allow defaulting in the `at::_ops` API - Move some code that was called when translating from C++ to Dispatcher API's directly into the codegen template (`check_tensor_options_and_extract_memory_format`) It's not great, but I don't think this specific include cycle will break down in the near future; the only code that we need to call before getting to `Operators.cpp` is the translations from various API's to the dispatcher API; there aren't many of them, and there's no major reason for them to live an external utils file somewhere. Moving the code into the headers also meant that the codegen no longer needs to deal with `Functions.cpp`/`TensorMethods.cpp`/`RedispatchFunctions.cpp`. All of the functions that used to be defined in `TensorMethods.cpp` seemed small enough for me to lump into `TensorBody.h`, but some of the functions in `Functions.cpp` looked pretty big to put in a header, so I moved the file to `aten/src/ATen/native/Functions.cpp`. It might be worth keeping `TensorMethods.cpp` there and leaving it too, in-case we have any beefy hand-written tensor methods that we don't want to put in a header. Perf I ran a few benchmarks in callgrind, and didn't see a noticeable instruction count change when calling `at::add()`. I also saw in the output that `at::add()` was successfully getting inlined. There's also probably a light risk of binary size increase; I think that there's a binary size regression test that I can run in phabricator (going to try it). I can also try inspecting `libtorch.so` directly and seeing if it's any bigger, but my hope is that the inline-ing means that we aren't generated separate symbols for `at::add` and `at::_ops::add_Tensor::call`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D28833086 Pulled By: bdhirsh fbshipit-source-id: 55f322a8378cb9a3cb6642f72aa291be381dd95b	2021-06-17 13:09:46 -07:00
Jane Xu	462448f07a	Enable GHA sharding on linux (#60124 ) Summary: This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags). This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124 Reviewed By: zou3519 Differential Revision: D29204211 Pulled By: janeyx99 fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0	2021-06-17 13:00:23 -07:00
Shen Li	bbedfd913d	Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59801 Fixes https://github.com/pytorch/pytorch/issues/59795. The RPC calls in shutdown no longer able to finish within 5s if there is no other RPCs before `rpc.shutdown()` in that process, because agent initialization can take longer than 5s. We don't have this problem previously, because TensorPipe's backend registry used to use RPC to communicate CUDA devices in `init_rpc`. However, after #58753, `init_rpc` uses ProcessGroup to communicate devices, and hence the channels/transport could be uninitialized after `init_rpc`. Differential Revision: D29039238 D29039238 Test Plan: Imported from OSS Reviewed By: rohan-varma Pulled By: mrshenli fbshipit-source-id: 46f89b01a058a51d271ddef9084a67b220a067b7	2021-06-17 11:47:54 -07:00
Richard Zou	ebafd2aadf	Stop warning on .names() access in max_pool2d and max_pool2d_backward (#60059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60059 Fixes #60053. The problem is that `.names()` always triggers the named tensor warning. To not trigger it, one has to guard it with has_names: `x.has_names() ? x.names() : DimnameList{}` This is not the first time this has happened; we should probably make it so that .names() doesn't raise a warning unless it is actually populated with names. That's a little tricky to implement so I'm leaving it for the future. Test Plan: - New test, also run `python test/test_nn.py -v -k "max_pool"` and confirm there are no warnings. Reviewed By: gchanan Differential Revision: D29152737 Pulled By: zou3519 fbshipit-source-id: 89a2fdbe6a6064a7044b5b75f7d0c58e51e57509	2021-06-17 10:34:41 -07:00
Brian Hirsh	ef09428804	Revert D29104399: Port `all` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104399 (`7809494c68`) Original commit changeset: 18bb747b7a19 fbshipit-source-id: f57043df5646f1e675e8a555cb4fa0e436953751	2021-06-17 10:32:23 -07:00
Brian Hirsh	3ff5507fb0	Revert D29104395: Port `any` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104395 (`519698362d`) Original commit changeset: 0cfde57c22ba fbshipit-source-id: ac5ebdc4b9d3aeb4c5eeab55c92ac931599d39d1	2021-06-17 10:32:21 -07:00
Brian Hirsh	81baa7fb0d	Revert D29104398: Using meta checks for unary `torch.all` and `torch.any`. Test Plan: revert-hammer Differential Revision: D29104398 (`c078cefa7d`) Original commit changeset: 6771b80130c9 fbshipit-source-id: 10e5a34370113fcd2f87aea2c2e76108fa9328d8	2021-06-17 10:32:20 -07:00
Brian Hirsh	873dac4b5a	Revert D29104397: Port `argmax` to structured kernels. Test Plan: revert-hammer Differential Revision: D29104397 (`6f3da4f4bf`) Original commit changeset: 580355cf3b4e fbshipit-source-id: e51fb79329066bc1a6364cfa44a8732908a684ed	2021-06-17 10:32:18 -07:00
Brian Hirsh	6b5e77904f	Revert D29104396: Port `argmin` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104396 (`226d745a0b`) Original commit changeset: 39c59bcc0446 fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b	2021-06-17 10:31:06 -07:00
Bin Bao	3dc8112187	[NNC] Handle int64 indices and loop bounds (#59769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59769 Allow loop bound and tensor indice to be either int32 or int64, and avoid unnecessary cast op. Test Plan: ``` build/bin/test_tensorexpr ``` Reviewed By: H-Huang Differential Revision: D29173970 Pulled By: desertfire fbshipit-source-id: 859a876ddb1b41535b2266089aa1222884295c78	2021-06-17 09:35:59 -07:00
Bin Bao	96b3537e71	[NNC] Add a dtypeToCppString virtual method in IRPrinter (#59449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59449 Make dtypeToCppString as a virtual method so that a child class can easily override the dtype string generation rule. This is needed as a preparation to make loop and tensor index as int64_t. Test Plan: ``` build/bin/test_tensorexpr ``` Reviewed By: H-Huang Differential Revision: D29173969 Pulled By: desertfire fbshipit-source-id: a447badba76788354da1c79f80c834c99f105776	2021-06-17 09:34:58 -07:00
Alexander Golynski	ed1da5be21	PG NCCL cleanup: remove usage of completed_ in WorkNCCL copies (#59899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59899 Test Plan: Imported from OSS Reviewed By: cbalioglu, osalpekar Differential Revision: D29080299 Pulled By: agolynski fbshipit-source-id: 9ae368f91e81f19471e0a20fc913d8e9df1b9dec	2021-06-17 09:05:35 -07:00
Sam Estep	010f4b6f2d	Add .isort.cfg (#60119 ) Summary: This adds the `.isort.cfg` file from https://github.com/pytorch/pytorch/issues/55928, but doesn't try to enforce it in CI because as that PR showed, that is currently difficult to do. We could use this to gradually sort the codebase according to this configuration (enforcing bits and pieces in CI) but I don't do that here. The advantage of including this file (even if we don't enforce it) is that it affects how certain tools work, thus encouraging a specific import style for people who happen to use those tools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60119 Test Plan: Open `test/run_test.py` in VS Code and run the Python Refactor: Sort Imports command. Compare with and without this PR. Reviewed By: 1ntEgr8 Differential Revision: D29199504 Pulled By: samestep fbshipit-source-id: 83e937b0f517c60e3e7dedb6c0306173908fbbb0	2021-06-17 09:04:25 -07:00
Yukio Siraichi	226d745a0b	Port `argmin` kernel to structured kernels. (#59938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104396 Pulled By: ezyang fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7	2021-06-17 08:18:13 -07:00
Yukio Siraichi	6f3da4f4bf	Port `argmax` to structured kernels. (#59937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59937 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104397 Pulled By: ezyang fbshipit-source-id: 580355cf3b4e9e5c934b4e51a16196087bcb3459	2021-06-17 08:18:12 -07:00
Yukio Siraichi	c078cefa7d	Using meta checks for unary `torch.all` and `torch.any`. (#59373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59373 This PR makes use of the newly implemented unified `at::meta::check_reduction` for validating the inputs and configuring its `TensorIterator`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104398 Pulled By: ezyang fbshipit-source-id: 6771b80130c91c2f1360853127de0acebcfff183	2021-06-17 08:18:10 -07:00
Yukio Siraichi	519698362d	Port `any` kernel to structured kernels. (#59372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59372 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104395 Pulled By: ezyang fbshipit-source-id: 0cfde57c22ba88607945c98f28b18df7709becd0	2021-06-17 08:18:08 -07:00
Yukio Siraichi	7809494c68	Port `all` kernel to structured kernels. (#59371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59371 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104399 Pulled By: ezyang fbshipit-source-id: 18bb747b7a19d873427d52c1145ef7cede333a0e	2021-06-17 08:16:41 -07:00
Rong Rong (AI Infra)	b8ab98626b	only runs mem leak check on master (#60023 ) Summary: setting environment variable to only do cuda mem leak check on master CI jobs. See discussion in https://github.com/pytorch/pytorch/pull/59402#issuecomment-860773034 See stats before/after disabling mem leak check: https://github.com/pytorch/pytorch/pull/59942#issuecomment-860947095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60023 Test Plan: https://github.com/pytorch/pytorch/issues/60108 https://github.com/pytorch/pytorch/issues/60116 Reviewed By: janeyx99 Differential Revision: D29164182 Pulled By: walterddr fbshipit-source-id: dfe88c2c1275b6eb35f18b58aacdc220f34ccb59	2021-06-17 07:56:26 -07:00
Mike Ruberry	59b10036d5	Unifies OpInfo dtype tests (#60157 ) Summary: Simplifies the OpInfo dtype tests and produces nicer error messages, like: ``` AssertionError: Items in the first set but not the second: torch.bfloat16 Items in the second set but not the first: torch.int64 : Attempted to compare [set] types: Expected: {torch.float64, torch.float32, torch.float16, torch.bfloat16}; Actual: {torch.float64, torch.float32, torch.float16, torch.int64}. The supported dtypes for logcumsumexp on cuda according to its OpInfo are {torch.float64, torch.float32, torch.float16, torch.int64}, but the detected supported dtypes are {torch.float64, torch.float32, torch.float16, torch.bfloat16}. The following dtypes should be added to the OpInfo: {torch.bfloat16}. The following dtypes should be removed from the OpInfo: {torch.int64}. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60157 Reviewed By: ngimel Differential Revision: D29188665 Pulled By: mruberry fbshipit-source-id: e84c9892c6040ea47adb027cfef3a6c0fd2f9f3c	2021-06-17 06:34:54 -07:00
Heitor Schueroff	4caca7a15b	Improved torch.einsum testing and fixed bug (#59731 ) Summary: Improved torch.einsum testing and fixed a bug where lower case letters appeared before upper case letters in the sorted order which is inconsistent with NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59731 Reviewed By: SplitInfinity, ansley Differential Revision: D29183078 Pulled By: heitorschueroff fbshipit-source-id: a33980d273707da2d60a387a2af2fa41527ddb68	2021-06-17 04:48:47 -07:00
Mikhail Zolotukhin	eb36f67dcc	[TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041 Differential Revision: D29146709 D29146709 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f	2021-06-17 01:23:24 -07:00
Mike Ruberry	6b1712019a	Revert D29132955: Pass RequestCallback to FaultyPG RPC agent Test Plan: revert-hammer Differential Revision: D29132955 (`cbbb7e145e`) Original commit changeset: bb7554b84bcb fbshipit-source-id: 4dfa2fbe7b8f58c951991c79aa9e2aa819793013	2021-06-17 00:50:32 -07:00
Mike Ruberry	3c3bb91103	Revert D29132956: Add some TORCH_API annotations to RPC Test Plan: revert-hammer Differential Revision: D29132956 (`04ec122868`) Original commit changeset: 8637640d56a1 fbshipit-source-id: f497adcbfd5a6b5a46b8689b1943ae2687ea737b	2021-06-17 00:50:30 -07:00
Mike Ruberry	f233274f30	Revert D28875276: Move RPC agents to libtorch Test Plan: revert-hammer Differential Revision: D28875276 (`fc50f91929`) Original commit changeset: f2f6970fd74d fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78	2021-06-17 00:48:58 -07:00
Mike Ruberry	e5c99d9908	Revert D29147009: [pytorch][PR] refine disabled test Test Plan: revert-hammer Differential Revision: D29147009 (`5fd6ead097`) Original commit changeset: 37e01ac6e8d6 fbshipit-source-id: e9cd819fd819e3d653deda3b7a981c39ec0452f4	2021-06-17 00:45:21 -07:00
Thomas J. Fan	a0ad4c24d1	MAINT Migrates rrelu_with_noise from THC to ATen on Cuda (#57864 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24618 Related to https://github.com/pytorch/pytorch/issues/24507 <details><summary>Benchmark script:</summary> ```py import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() return time.time() device = "cuda" m = nn.RReLU().cuda() for n in [100, 10_000, 100_000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print(f"input size(128, {n}) forward time is {fwd_avg:.2f} (ms)") ``` </details> ### Results from benchmark: #### This PR ``` input size(128, 100) forward time is 0.01 (ms) input size(128, 10000) forward time is 0.06 (ms) input size(128, 100000) forward time is 0.54 (ms) ``` #### On master ``` input size(128, 100) forward time is 0.01 (ms) input size(128, 10000) forward time is 0.08 (ms) input size(128, 100000) forward time is 0.66 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57864 Reviewed By: H-Huang Differential Revision: D29177169 Pulled By: ngimel fbshipit-source-id: 4572133db06f143d27e70a91ade977ea962c8f77	2021-06-17 00:35:16 -07:00
Tao Xu	9e79a8a54f	[iOS GPU][MaskRCNN] Force the temporaryImage to become static when doing synchronization (#60155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60155 For intermediate tensors, we need to convert them to static images when doing GPU -> CPU synchronization. ghstack-source-id: 131540760 Test Plan: - CI - buck test pp-macos Reviewed By: SS-JIA Differential Revision: D29126278 fbshipit-source-id: cd50b5f104e0161ec7fcfcc2c51785f241e48704	2021-06-17 00:25:14 -07:00
Peter Bell	0e7b5ea6c0	nonzero: Default to transposed output strides (#59370 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46224 cc ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/59370 Reviewed By: ezyang Differential Revision: D29143842 Pulled By: ngimel fbshipit-source-id: 5aa7a247b4a70cd816d0eed368ab4c445568c986	2021-06-16 22:50:38 -07:00
Angela Yi	c0b7c59e55	[quant] Equalization Observer modifications (#59953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953 The following modifications were made to the equalization observers due to design changes: - [InputEqualizationObserver] Replaced `calculate_qparams()` with `calculate_scaled_minmax()` since we will need to return the scaled min/max values to update the following input quantization observer - [WeightEqualizationObserver] We no longer need a row observer since this will be taken care of by the following weight quantization observer - [WeightEqualizationObserver] Following the previous comment, we no longer need to calculate the scaled qparam values. Instead, we will use the equalization scale to later scale the weights and the qparams will be taken care of by the weight quantization observer. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: supriyar Differential Revision: D29135332 fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831	2021-06-16 22:32:30 -07:00
Angela Yi	45c31cabb5	[quant] Input Weight Equalization - prepare modifications (#59747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59747 Modifies prepare_fx for input-weight equalization. If a current node is being equalized (there exists a EqualizationQConfig), then the EqualizationObserver will be inserted before its quantization observer. For a singular linear layer, the general flow looks like: Original graph: `x0 -> linear -> x1`, `w -> linear` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> x1` `w -> WeightEqObs -> MinMaxObs -> linear1` For two connected linear layers, the general flow looks like: Original graph: `x0 -> linear1 -> linear2 -> x1`, `w1 -> linear1`, `w2 -> linear2` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> InpEqObs -> linear2 -> MinMaxObs -> x1` `w1 -> WeightEqObs -> MinMaxObs -> linear1`, `w2 -> WeightEqObs -> MinMaxObs -> linear2 Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_equalization_prepare` Original model with one `nn.Linear` layer ``` LinearModule( (linear): Linear(in_features=1, out_features=1, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` -------------------------------------- Original model with two connected functional linear layers ``` FunctionalLinearModule( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_equalization_process_0 : [#users=1] = call_module[target=linear2_w_equalization_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_00](args = (%linear2_w_equalization_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135316 fbshipit-source-id: 91697e805ede254dbb2a42ee4c23eb1c1c64590e	2021-06-16 22:32:28 -07:00
Angela Yi	7ce74f3339	[quant] EqualizationQConfig to distinguish input/output activations (#59739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59739 Created an EqualizationQConfig specifically for equalization. This inherits from QConfig and is used to distinguish between inserting an input observer with an output observer. Since the output observer field is included in the EqualizationQConfig, we no longer need an output observer field in the _InputEqualizationObserver Test Plan: compiles Imported from OSS Reviewed By: ezyang Differential Revision: D29135298 fbshipit-source-id: 3dde9c029c291467ff0a0845f0fc9c44573fc6f6	2021-06-16 22:31:18 -07:00
Andrew Gu	c6cdb4f113	Refactor ZeroRedundancyOptimizer Assuming SPSD (#59834 ) Summary: Overview: This refactors the `ZeroRedundancyOptimizer` implementation to assume single-process single-device (SPSD) instead of accommodating single-process multiple-device (SPMD). `DistributedDataParallel` [retired SPMD recently](https://github.com/pytorch/pytorch/issues/47012), so this change follows the same spirit. Changes: The parent-class `Optimizer` constructor permits the input argument `params` to be both an `iterable` of `torch.Tensor` and an `iterable` of `dict`. The latter usage is for initializing the optimizer with multiple `param_group`s to start. However, currently, `ZeroRedundancyOptimizer` only supports the former usage, requiring explicit calls to `add_param_group()` for multiple `param_group`s. Given the existing implementation, the type error would be silent and not manifest until much later (e.g. since `super().__init__()` would have no issue). Hence, I added a series of checks to begin the `__init__()` function (encapsulated in `_verify_and_init_params()`). A postcondition of this validation is that `self._all_params` is a non-empty list of all model parameters. Additionally, I added a check for SPSD usage assuming that all model parameters exist on the same device. This logic is included in `_verify_same_param_device()` and is called immediately after the `params` type-checking. Support for SPSD with model parameters sharded across devices may be added in the future. Related to that aforementioned post-condition on `self._all_params`, previously there was undefined behavior resulting from different typing of the passed in `params` input argument. If `params` was a `List`, then the usage of `self._reference_is_trainable_mask` was as expected. However, if `params` was a generator (e.g. as in the canonical usage of passing `model.parameters()`), then the ensuing behavior was divergent. This is because after a generator is iterated over, it is empty. As a result, when we set `self._all_params = params` [in the old code](`68d690ffbd/torch/distributed/optim/zero_redundancy_optimizer.py (L165)`), `self._all_params` is empty, reducing `training_mask` to always be the empty list. This causes missed calls to `_update_trainable()` in `step()`. (A consequence of this is that `test_pytorch_parity()`, which is renamed to `test_local_optimizer_parity()`, now outputs warnings about the trainable parameters changing.) The existing implementation assumes that all parameters share the same dense type when allocating the bucket buffers. This change preserves this assumption, which may be removed in the future. I added a check for this in `_verify_same_dense_param_type()` to avoid erroring silently later on. Note that it is insufficient to simply check for the same `dtype` since dense and sparse tensors may share the same `dtype` but require differing storage sizes. One solution is to use `torch.typename()` as the means for comparison. --- The primary change in this refactor is with respect to `self._per_device_params` and `self.buckets`. `self._per_device_params` mapped `torch.device` to `List[List[Parameter]]`. The keys were the devices that the model parameters exist on, and the values designated which ranks are assigned to updating those parameters. `self.buckets` mapped `torch.device` to `List[torch.Tensor]`. The keys were the same as `self._per_device_params`, and the values were the buckets for that device. The usage of these two data structures were confined to each other only. Hence, because the notions of device and rank are now in 1:1 correspondence, we can eliminate the former completely and only use rank. As such, I removed `self._per_device_params` and made `self.buckets` directly a list of buckets (i.e. `torch.Tensor`s). Iteration over the parameters of a rank for a given device could be simplified to just iteration over the parameters of a rank. Hence, I relied on `self.partition_parameters()` now for that iteration. Refer to `_setup_flat_buffers()` and `step()` for these changes. One convenient side effect of removing `self._per_device_params` is that there is no longer the re-computation of the parameter partitions mentioned at the end of this [PR](https://github.com/pytorch/pytorch/pull/59410). --- I changed the data structure `self._index_to_param_cache` from a `dict` to a `List` because the domain is `0`, `1`, ..., `k-1` where `k` is the number of parameters. This should yield marginal improvements in memory usage and access speed. `_sync_param_groups()` is a static method, meaning it can be called either via `self._sync_param_groups()` or `ZeroRedundancyOptimizer._sync_param_groups()` when inside the class. I made the usage consistently `self._sync_param_groups()` rather than have instances of both. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59834 Test Plan: I ran through the existing test suite on an AI AWS cluster: ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` Note: The only test where `parameters_as_bucket_view` is `True` is `test_step_with_closure()`, meaning that that is the test that exercises the core changes of removing `self._per_device_params` and changing `self.buckets`. Also, I added tests for the `ZeroRedundancyOptimizer` constructor changes and the assumption checks. Reviewed By: mrshenli Differential Revision: D29177065 Pulled By: andwgu fbshipit-source-id: 0ff004ae3959d6d3b521024028c7156bfddc93d8	2021-06-16 20:52:13 -07:00
Jason Ansel	85517a2b70	[TensorExpr] More python binding cleanups (#60058 ) Summary: A few more quality of life improvements for NNC's python bindings: - Use standard `torch.dtype`s (rather than `te.Dtype`) - Make names optional (they don't seem to matter) - Make shapes optional - A few implicit conversions to make code cleaner Followup to https://github.com/pytorch/pytorch/issues/59920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60058 Reviewed By: bertmaher Differential Revision: D29151953 Pulled By: jansel fbshipit-source-id: c8286e329eb4ee3921ca0786e17248cf6a898bd8	2021-06-16 20:06:08 -07:00
Meghan Lele	c01939a9b1	[JIT] Handle modules that already have __constants__ (#60003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60003 Summary `infer_concrete_type_builder` in `_recursive.py` assumes `__constants__` is a `set` if it exists as an attribute on the module being scripted. Instead, it should create a set out of whatever `__constants__` is. Test Plan Ran code from the issue. Fixes This commit fixes #59947. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D29174243 Pulled By: SplitInfinity fbshipit-source-id: aeb8bded80038da35478714b6a697a766ac447f5	2021-06-16 20:01:18 -07:00
Ivan Yashchuk	d99a8a31b1	Fix version comparison for defining CUDA11OrLater (#60010 ) Summary: Before this PR `CUDA11OrLater` was incorrectly set to `False` when `torch.version.cuda == "11.0"`. `torch.version.cuda` returns major and minor CUDA versions, it doesn't return patch info. LooseVersion comparison was calling `[11, 0] >= [11, 0, 0]` which evaluates to `False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60010 Reviewed By: mruberry Differential Revision: D29147107 Pulled By: ezyang fbshipit-source-id: bd9ed076337b4d32bf1c3376b8f7ae15dbc4d08d	2021-06-16 18:04:29 -07:00
Brian Hirsh	c458bb985e	make it easier to grep for unary/binary op kernels (#60128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60128 Test Plan: Imported from OSS Reviewed By: wenleix Differential Revision: D29175499 Pulled By: bdhirsh fbshipit-source-id: 1838900276e0b956edf25cdddcff438ff685a50e	2021-06-16 17:49:21 -07:00
kshitij12345	3288c9d304	[numpy] mvlgamma: int -> float promotion (#59934 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Last int->float promotion as per the tracker! Pull Request resolved: https://github.com/pytorch/pytorch/pull/59934 Reviewed By: H-Huang Differential Revision: D29160008 Pulled By: mruberry fbshipit-source-id: 389a5a7683e0c00d474da913012768bf2a212ef0	2021-06-16 17:44:20 -07:00
Jordan Fix	f65793507d	[fx][Transformer] Add override for call_function (#60057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60057 This ensures that if a function was `wrap`'d before symbolic tracing + being passed into the transformer then it will still be wrapped. Test Plan: Added test to `test_fx.py` Reviewed By: jamesr66a Differential Revision: D29151191 fbshipit-source-id: 93560be59505bdcfe8d4f013e21d4719788afd59	2021-06-16 17:25:55 -07:00
cyy	5f017e91b8	don't use moved field in the second lambda (#59914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59914 Reviewed By: H-Huang Differential Revision: D29147018 Pulled By: ezyang fbshipit-source-id: 04fe52fb8cf3cc8f3a538a2dddb13c52cf558549	2021-06-16 17:22:15 -07:00
kshitij12345	64aec8d2ca	[testing] OpInfoHelper tool (#58698 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/57577 Usage: Add OpInfo entry to `common_methods_invocations` with `dtypes=_DYNAMIC_DYTPES` Eg. ``` OpInfo('atan2', dtypes=_DYNAMIC_DTYPES, sample_inputs_func=sample_inputs_atan2,) ``` Run the helper with `python -m torch.testing._internal.opinfo_helper` Output ``` OpInfo(atan2, # hint: all_types + (torch.bool,), dtypes=[torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64, torch.bool], # hint: all_types + (torch.bool, torch.bfloat16, torch.float16), dtypesIfCUDA=[torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64, torch.bool, torch.bfloat16, torch.float16], sample_inputs_func=sample_inputs_atan2) ``` Output without CUDA (run with `$ CUDA_VISIBLE_DEVICES=-1 python -m torch.testing._internal.opinfo_helper`) ``` UserWarning: WARNING: CUDA is not available, information pertaining to CUDA could be wrong warnings.warn("WARNING: CUDA is not available, information pertaining to CUDA could be wrong") OpInfo(atan2, # hint: all_types + (torch.bool,), dtypes=[torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64, torch.bool], sample_inputs_func=sample_inputs_atan2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58698 Reviewed By: H-Huang Differential Revision: D29160668 Pulled By: mruberry fbshipit-source-id: 707370a83b451b02ad2fe539775c8c50ecf90be8	2021-06-16 17:17:03 -07:00
Edward Yang	0bf1260795	Fix Python 3.8 expecttest machinery again, this time for good. (#60044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60044 In #59709 I attempted to fix the expecttest machinery to work in Python 3.8. However, I noticed that it would fail to do substitutions in this case: ``` self.assertExpectedInline( foo(), """bar""" ) ``` This is because the triple quoted string is not on the same line as the backtrace line number (at the very beginning), and for safety reasons the preexisting regex refused to search beyond the first line. This wasn't a big deal prior to Python 3.8 because the flipped version of the regex simply required the triple quoted string to be flush with the end of the statement (which it typically was!) But it is a big deal now that we only have the start of the statement. I couldn't think of a way to fix this in the current model, so I decided to call in the big guns. Instead of trying to do the regex with only the start xor end line number, I now require you provide BOTH line numbers, and we will only regex within this range. The way we compute these line numbers is by parsing the Python test file with ast, and then searching through statements until we find one that is consistent with the line number reported by the backtrace. If we don't find anything, we conservatively assume that the string lies exactly in the backtrace (and you'll probably fail the substitution in that case.) The resulting code is quite a lot simpler (no more reversed regex) and hopefully more robust, although I suppose we are going to have to do some field testing. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29146943 Pulled By: ezyang fbshipit-source-id: 2c24abc3acd4275c5b3a8f222d2a60cbad5e8c78	2021-06-16 17:10:16 -07:00
Victor Quach	dab1e59652	Remove dead code in SavedVariable (#59838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59838 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29069214 fbshipit-source-id: 5debf93a6c3d1c3d585efbe54438e8df92646d62	2021-06-16 16:44:16 -07:00
Victor Quach	1efa863837	Avoid un-necessary unwrapping of Tensor in SavedVariable (#59837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59837 Fixes #58500 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29069215 fbshipit-source-id: 603db3c8a64b729e86385ed774825f01c6ce0f20	2021-06-16 16:43:04 -07:00
Patrick	5948e6f653	removed gelu from autocast fp32 list (#59639 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59639 Reviewed By: H-Huang Differential Revision: D29155914 Pulled By: ezyang fbshipit-source-id: feb117181894c2355768d5b1189b3d5f1649fc0b	2021-06-16 16:29:57 -07:00
Jerry Zhang	a95207dad4	[quant] Add a quantize_per_tensor overload that takes Tensor quantization parameters (#59773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59773 Current quantize_per_tensor takes float scale and int zero_point, which does not work with Proxy, this PR adds a quantize_per_tensor overload that takes Tensor scale and zero_point instead. Test Plan: Tested locally that following runs without errors: ```python import torch from torch.quantization.quantize_fx import prepare_fx, convert_fx from torch.fx.experimental import normalize class TestModule(torch.nn.Module): def forward(self, x): return x + x mod = TestModule() mod.eval() config = {"": torch.quantization.get_default_qconfig("fbgemm")} mod = prepare_fx(mod, config) mod = convert_fx(mod) mod = torch.fx.Transformer(mod).transform() ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29019862 fbshipit-source-id: c0176040f3b73f0a30516ed17d261b44cc658407	2021-06-16 16:07:20 -07:00
Mike Ruberry	5686fe5817	Revert D29154971: Training resnext with msuru_suru_union and ig_msuru_suru_union datasets Test Plan: revert-hammer Differential Revision: D29154971 (`9f68f93aca`) Original commit changeset: d534d830020f fbshipit-source-id: a3d16acc8e6b66a6010b501c28dbe295f573bc86	2021-06-16 15:33:14 -07:00
Richard Barnes	4c8c61f200	Some fixes to vec256_bfloat16.h (#59957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59957 Test Plan: Sandcastle Reviewed By: VitalyFedyunin Differential Revision: D29073913 fbshipit-source-id: dc01a2015e4ff42daa1d69443460182744c06e90	2021-06-16 15:17:15 -07:00
Zachary DeVito	8ce6d0c42f	[torch deploy] add register_module_source (#58290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58290 this is a helper function to get some python source code loaded on each interpreter without having to use the standard import system or packages. Useful for debugging or for writing wrapper classes for handling loaded modules. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D28435306 Pulled By: zdevito fbshipit-source-id: b85c16346b9001cd7350d65879cb990098060813	2021-06-16 14:41:13 -07:00
Gisle Dankel	fd1e9253ff	[Profiler] Fix timestamp discrepancy in profiler_kineto.cpp (#60070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60070 PyTorch pull request https://github.com/pytorch/pytorch/pull/57333 changed high_resolution_clock to system_clock but missed one location in profiler_kineto.cpp. On some platforms (e.g. Windows), high_resolution_clock and system_clock do not map to the same underlying clock and therefore we get mixed timestamps on some platforms. Reviewed By: wesolwsk Differential Revision: D29155809 fbshipit-source-id: a6de6b4d550613f26f5577487c3c53716896e219	2021-06-16 14:25:24 -07:00
David Riazati	9d7764642b	Use GitHub's diff directly in clang-tidy (#60048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60048 This changes clang-tidy in lint.yml to pull the raw diff from GitHub and parse that rather than use the PRs base revision. The base revision can cause the spurious inclusion of files not changed in the PR as in https://github.com/pytorch/pytorch/pull/59967/checks?check_run_id=2832565901. We could be smarter about how we query git, but this approach ends up being simpler since we just need to search for the diff headers in the .diff file. See https://github.com/pytorch/pytorch/pull/60049/checks?check_run_id=2834140350 for an example CI run with this on Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29148886 Pulled By: driazati fbshipit-source-id: ca23446d5cc8938d1345f272afe77b9ee8898b74	2021-06-16 13:40:09 -07:00
Rong Rong (AI Infra)	b2fc6de2c4	support parsing of PR stats in run_test.py (#60026 ) Summary: Currently S3 test stats doesn't support PR stats parisng. Changes to s3_stats_parser: 1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both 2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed Changes to run_test.py 1. Reordering based on previous PR stats if available 2. Falling back to file change option if not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026 Test Plan: - CI. - local repro: plz run: ``` CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py ``` Reviewed By: samestep Differential Revision: D29164754 Pulled By: walterddr fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b	2021-06-16 13:32:31 -07:00
Erjia Guan	691183bb74	Fix compile failure on CUDA92 (#60017 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60016 For CUDA 92 - OptionalBase was not check if `is_arrayref` - constexpr seems not expect to raise Exception for cuda 92 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60017 Reviewed By: malfet Differential Revision: D29139515 Pulled By: ejguan fbshipit-source-id: 4f4f6d9fe6a5f2eadf913de0a9781cc9f2e6ac6f	2021-06-16 12:23:08 -07:00
Serhat Yilmaz	15dbc566c5	[torch][segment_reduce] Add missing cuda kernel launch check (#60114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60114 Same as title. Test Plan: Unit test (test_kernel_launch_checks.py) is passing. Reviewed By: ngimel Differential Revision: D29169538 fbshipit-source-id: ba4518dcb1a4713144d92faec2bb5bdf656ff7c5	2021-06-16 12:19:12 -07:00
Neel Pragnesh Gandhi	2c5db9a40a	Add c10d filestore functionality to the current c10d_rendezvous_backend (#59719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59719 Added filestore functionality to the c10d backend. FileStore will create a temporary file in the /tmp directory to use if it is selected as the store type. Appropriate tests were added as well. FileStore was modified to expose the path field for testing. It was also modified so that the numWorkers field in the constructor is optional (defaulting to -1). A negative value indicates there is not a fixed number of workers. In this case, the file is not attempted to be cleaned at the end. Test Plan: Unit tests for creating a c10d backend with filestore and simple error handling. Reviewed By: cbalioglu, H-Huang Differential Revision: D28997436 fbshipit-source-id: 24c9b2c9b13ea6c947e8b1207beda892bdca2217	2021-06-16 12:13:36 -07:00
Eli Uriegas	84688b0c40	ci: Add note about file_diff_from_base for GHA (#60110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60110 file_diff_from_base is currently bugged for ghstack PRs since it fails to find a merge base Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29168767 Pulled By: seemethere fbshipit-source-id: 580a909aa392541769cbbfdc6acce1e6c5d1c341	2021-06-16 11:31:02 -07:00
Michael Suo	15f236f3e3	[package] fix tutorial link (#60113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60113 The tutorial link in the docs was to an fb-only colab. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D29169818 Pulled By: suo fbshipit-source-id: 374807c234a185bd515b8ffe1300e6cf8d821636	2021-06-16 11:27:25 -07:00
Zhuangzhuang Zhang	9f68f93aca	Training resnext with msuru_suru_union and ig_msuru_suru_union datasets Summary: We updated the training scripts and re-trained the Resnext model with msuru_suru_union and ig_msuru_suru_union datasets Test Plan: Main command line to run: ./deeplearning/projects/classy_vision/fb/projects/msuru_suru/scripts/train_cluster.sh Config we used is msuru_suru_config.json, which is "Normal ResNeXt101 with finetunable head". Experiments: - msuru_suru_union f279939874 - Train/test split - msuru_suru_union_dataset_train_w_shard: 143,632,674 rows - msuru_suru_union_dataset_test_w_shard: 1,831,236 rows - Results {F625232741} {F625232819} - ig_msuru_suru_union f279964200 - Train/test split - ig_msuru_suru_union_dataset_train_w_shard: 241,884,760 rows - ig_msuru_suru_union_dataset_test_w_shard: 3,477,181 rows - Results {F625234126} {F625234457} Differential Revision: D29154971 fbshipit-source-id: d534d830020f4f8e596bb6b941966eb84a1e8adb	2021-06-16 11:22:50 -07:00
Eli Uriegas	8c4e78129e	.circleci: Disable Windows GPU jobs (#60024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60024 Disables windows GPU jobs on CircleCI since they have been migrated to GHA Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29137287 Pulled By: seemethere fbshipit-source-id: 204e0c9232201a36a557cd0843e31d34269cc722	2021-06-16 10:45:14 -07:00
Rong Rong (AI Infra)	74ea1f23b4	Revert D29148233: [pytorch][PR] Add GITHUB_HEAD_REF in check for IN_PULL_REQUEST Test Plan: revert-hammer Differential Revision: D29148233 (`241aac3ef8`) Original commit changeset: 7c8c1866f39c fbshipit-source-id: f32c6c6decd737ef290d3e83c9d021475aabaab0	2021-06-16 10:41:30 -07:00
Cao Gao	bac6bcd6d8	Update call site for FBGemm quantization util functions. (#624 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59637 Replace FloatToFusedNBitRowwiseQuantizedSBHalf, FusedNBitRowwiseQuantizedSBHalfToFloat, FloatToFused8BitRowwiseQuantizedSBFloat, and Fused8BitRowwiseQuantizedSBFloatToFloat with newer version. Test Plan: CI tests. Reviewed By: dskhudia Differential Revision: D28918581 fbshipit-source-id: a21274add71439c5e51287a0e2ec918a8d8e5392	2021-06-16 10:15:34 -07:00
Jane Xu	d88fbf0fbc	fix minor typo in run_test.py (#60055 ) Summary: Fixes typo in run_test.py for option use_specified_test_cases_by Pull Request resolved: https://github.com/pytorch/pytorch/pull/60055 Reviewed By: walterddr Differential Revision: D29150156 Pulled By: janeyx99 fbshipit-source-id: 375e594d09c83188bfa80762c8b833a0b7c5cca4	2021-06-16 09:30:45 -07:00
Jane Xu	241aac3ef8	Add GITHUB_HEAD_REF in check for IN_PULL_REQUEST (#60047 ) Summary: I believe IN_PULL_REQUEST is unset for some GHA test runs because we don't also check GITHUB_HEAD_REF. This PR is a small fix for that. Example: https://github.com/pytorch/pytorch/pull/60023/checks?check_run_id=2831813860 doesn't set it properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/60047 Reviewed By: walterddr Differential Revision: D29148233 Pulled By: janeyx99 fbshipit-source-id: 7c8c1866f39ce8af8d13c34ddc0c5786a829321e	2021-06-16 08:57:49 -07:00
Elton Leander Pinto	a6ecfb3296	Update lint.yml to use custom clang-tidy build (#59967 ) Summary: Related: https://github.com/pytorch/pytorch/issues/59815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59967 Reviewed By: samestep Differential Revision: D29164686 Pulled By: 1ntEgr8 fbshipit-source-id: b6f9fb6fa4280f757a54a37b30b027b7504bef63	2021-06-16 08:45:24 -07:00
Bert Maher	842a831f53	[nnc] Move batchnorm to operators library (#59992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59992 Wrapped batch norm in function `computeBatchNorm`. ghstack-source-id: 131407851 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D29116661 fbshipit-source-id: 2873a9a3e70f31db1988787160fc96c388ea3d4a	2021-06-16 05:09:59 -07:00
Bert Maher	bda40639c5	[nnc] Move operator implementations into a subdirectory (#59988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59988 As we broaden operator support, putting all the implementations into kernel.cpp is getting unwieldy. Let's factor them out into the "operators" subdirectory. This diff is big but it's entirely code movement; I didn't change anything, other than to expose a few utilities in kernel.h. ghstack-source-id: 131405139 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D29115916 fbshipit-source-id: ba0df1d8dd4a108b584da3baf168407e966b2c78	2021-06-16 05:08:50 -07:00
lezcano	f43ff754ca	[docs] Correct errata in linalg.eigh and add a bit more information (#59784 ) Summary: Add extra information about the returned elements of the spectral decompositions Resolves https://github.com/pytorch/pytorch/issues/59718 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59784 Reviewed By: soulitzer Differential Revision: D29088998 Pulled By: mruberry fbshipit-source-id: 58a191c41ff5e4c9d9675e5b3d7cbbcf16be4da1	2021-06-16 01:21:09 -07:00
Nikita Shulga	36a5647e30	Handle exceptions from THPModule_setQEngine (#60073 ) Summary: Prevents Python runtime crashes when `torch._C._set_qengine(2**65)` or `torch.backends.quantized.engine="fbgemm"` if PyTorch was compiled without fbgemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/60073 Reviewed By: supriyar Differential Revision: D29156430 Pulled By: malfet fbshipit-source-id: 95b97352a52a262f1634b72da64a0c950eaf2373	2021-06-16 00:40:59 -07:00
Hangchen Yu	9fbbab88da	[fx-acc] Saturate host by replicating partitions onto idle devices (#60064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60064 This implements a host saturation optimization to maximize the utilization of the available devices. It uses a greedy heuristic to replicate all partitions on the used devices to another set of idle devices with enough memory. The added unittest shows an example as follows: ``` partition_0: 192 bytes; partition_1: 48 bytes dev_0: 200 bytes, [partition_0] dev_1: 200 bytes, [partition_1] dev_2: 100 bytes, dev_3: 100 bytes, dev_4: 200 bytes, dev_5: 100 bytes ``` Before host saturation, `partition_0` is assigned to dev_0 and `partition_1` is assigned to dev_1. After host saturation, `partition_0` is replicated to dev_4 simply because it's the only device that can hold all partitions on dev_0. `partition_1` is replicated to dev_2 because it has minimal but large enough memory to hold all partitions on dev_1. Test Plan: ``` buck test mode/opt //caffe2/test:test_fx_experimental -- --exact 'caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental)' Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249343103429 ✓ ListingSuccess: caffe2/test:test_fx_experimental - main (1.322) ✓ Pass: caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental) (1.322) Summary Pass: 1 ListingSuccess: 1 ``` An e2e test will be added to `test_fx_glow.py` in a followup diff. Reviewed By: gcatron Differential Revision: D29039998 fbshipit-source-id: 57518aadf668f7f05abd6ff73224c16b5d2a12ac	2021-06-15 23:04:46 -07:00
Jerry Zhang	a344b09db2	[quant][fx][graphmode] Remove Quantizer class (#59606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59606 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D28951432 fbshipit-source-id: 3301f7200a4c7166673c27f9ac7ff559f1e6935d	2021-06-15 21:54:57 -07:00
clint	78011bc0ce	typofix (torch.zero to torch.zeros) in docstring (#59703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59703 Reviewed By: ezyang Differential Revision: D29145998 Pulled By: H-Huang fbshipit-source-id: f2670502170aa100fb02408046b7f6850f9379cf	2021-06-15 21:12:42 -07:00
Stephen Macke	e50f264b51	[caffe2] make MulGradient implementation in-place compatible (#60035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60035 In Caffe2, the operator schema for the MulGradient op indicates that MulGradient may be performed in-place, overwriting one of its inputs as the output. The implementation is not safe to perform in-place however, due to an accidentally-introduced write-read dependency on the overwriten input in the in-place case. We fix it here. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test ``` Note that the newly added test fails without this change, but passes with this change: ``` ✓ ListingSuccess: caffe2/caffe2/python/operator_test:elementwise_ops_test - main (24.992) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_exp (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_log1p (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_abs (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_and (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_reciprocal (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sqr (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_rsqrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_mul (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sqrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_add (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_swish_gradient_inplace (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sigmoid (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_or (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cbrt_grad (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_not (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sub (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_div (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_eq (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_softsign (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_eq_bcast (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_powt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) *********************************************************************************************************************************************************************************** *******************************<NEW_TEST_YAY>******************************************************************************************************************************** ********************************************************************************************************************************************************************************* ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_mul_gradient_inplace (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ********************************************************************************************************************************************************************************* *******************************</NEW_TEST_YAY>******************************************************************************************************************************* *********************************************************************************************************************************************************************************** ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_hard_sigmoid (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_xor (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_log (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cube (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_swish (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cbrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_div_legacy_grad (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - main (125.898) Summary Pass: 30 ListingSuccess: 1 ``` Reviewed By: clrfb Differential Revision: D29034265 fbshipit-source-id: 98550e1d5976398e45d37ff2120591af1439c42a	2021-06-15 20:26:04 -07:00
Hao Lu	eda2ddb5b0	[ATen] Fix aten::to schema (#60001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001 Fix the aten::to schema to reflect that the output may alias input. Test Plan: Added new unit tests. Reviewed By: ezyang Differential Revision: D29121620 fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c	2021-06-15 20:04:20 -07:00
Hangchen Yu	95257e8a62	[fx-acc] Fix wrong device assignment in find_single_partition (#60056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60056 Previously we put the whole graph as a single partition onto a device with maximum memory if possible, but the code assumed that the first logical device always has the maximum memory. This diff fixes this issue and updates the unittest to reflect such a corner case. Test Plan: ``` buck test mode/opt //caffe2/test:test_fx_experimental -- --exact 'caffe2/test:test_fx_experimental - test_find_single_partition (test_fx_experimental.TestFXExperimental)' Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6473924507772744 ✓ ListingSuccess: caffe2/test:test_fx_experimental - main (1.357) ✓ Pass: caffe2/test:test_fx_experimental - test_find_single_partition (test_fx_experimental.TestFXExperimental) (1.206) Summary Pass: 1 ListingSuccess: 1 ``` Reviewed By: gcatron Differential Revision: D29118715 fbshipit-source-id: cac6a1f0d2f47717446dcc80093bbcf362663859	2021-06-15 19:36:38 -07:00
Bert Maher	469f0e42d6	[nnc] Handle more cases of excessive # of cat args (#60043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60043 And add a unit test Test Plan: new unit test Reviewed By: navahgar Differential Revision: D29146547 fbshipit-source-id: 31532926032dbef70d163930f3d8be160f5eacc3	2021-06-15 18:19:52 -07:00
jjsjann123	1207745e98	fixing illegal memory access on NHWC BN kernel (#59981 ) Summary: adding an early exit in the kernel to avoid reading out of bound. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59981 Reviewed By: ezyang Differential Revision: D29147349 Pulled By: ngimel fbshipit-source-id: b36a6a9e2526c609ff98fb5a44468f3257e0af67	2021-06-15 16:57:41 -07:00
Brian Hirsh	27a3204982	generate C++ API for meta functions using at::meta:: (#58570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58570 What the PR does Generate a fast-path `at::meta::{op}` API for calling meta functions without having to go through the dispatcher. This will be important for perf for external backends that want to use meta functions for shape checking (which seems likely to be what we end up doing for LazyTensorCore). Details In order to avoid naming collisions I had to make two small changes: - rename `MetaFunctions.h` template -> `NativeMetaFunctions.h` (this is the file that declares the impl() function for every structured operator). - rename the meta class: `at::meta::{op}::meta()` -> `at::meta::structured_{op}::meta()` I also deleted a few unnecessary includes, since any file that includes NativeFunctions.h will automatically include NativeMetaFunctions.h. Why I made the change This change isn't actually immediately used anywhere; I already started writing it because I thought it would be useful for structured composite ops, but that isn't actually true (see [comment](https://github.com/pytorch/pytorch/pull/58266#issuecomment-843213147)). The change feels useful and unambiguous though so I think it's safe to add. I added explicit tests for C++ meta function calls just to ensure that I wrote it correctly - which is actually how I hit the internal linkage issue in the PR below this in the stack. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711299 Pulled By: bdhirsh fbshipit-source-id: d410d17358c2b406f0191398093f17308b3c6b9e	2021-06-15 16:54:46 -07:00
Brian Hirsh	e341bab8ae	bugfix: ensure that at::{dispatch_key}:: API gets external linkage (#58569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58569 This should allow external C++ files that aren't compiled into `libtorch.so`/`libtorch_cpu.so` (including all of fbcode) to use fast path functions like `at::cpu::add()`, which skip the dispatcher. So, after spending way too much time trying to figure out why I was getting linker errors when calling `at::meta::{op}` and `at::cpu::{op}` from C++ test files, I realized that we're not including the header files for C++ for the namespaced operator definitions. I.e. `RegisterCPU.cpp`, which provides definitions for the `at::cpu::{op}` fast path functions, wasn't including the `CPUFunctions.h` header. Why that breaks stuff: the `CPUFunctions.h` header file is what marks each function with the `TORCH_API` macro, so without including it, when we build `libtorch.so` and `libtorch_cpu.so`, the compiler will look at the definition in `RegisterCPU.cpp`, not see a `TORCH_API`, and decide that the function should get internal linkage. An alternative would be to directly mark the function definitions in `RegisterCPU.cpp` with `TORCH_API`, but this seemed cleaner. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711300 Pulled By: bdhirsh fbshipit-source-id: 535f245c20e977ff566d6da0757b3cefa137040b	2021-06-15 16:53:22 -07:00
Nikolay Korovaiko	5fd6ead097	refine disabled test (#60040 ) Summary: This is to refine: https://github.com/pytorch/pytorch/pull/60029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60040 Reviewed By: ezyang Differential Revision: D29147009 Pulled By: Krovatkin fbshipit-source-id: 37e01ac6e8d6f7e6b5c517f7804704f9136a56f5	2021-06-15 16:22:29 -07:00
Luca Wehrstedt	fc50f91929	Move RPC agents to libtorch (#59939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28875276 fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45	2021-06-15 16:20:53 -07:00
Luca Wehrstedt	04ec122868	Add some TORCH_API annotations to RPC Summary: They will be needed when RPC gets merged into libtorch Test Plan: CI later in the stack Reviewed By: mrshenli Differential Revision: D29132956 fbshipit-source-id: 8637640d56a1744a5dca5eb7d4b8ad0860c6b67c	2021-06-15 16:20:51 -07:00
Luca Wehrstedt	cbbb7e145e	Pass RequestCallback to FaultyPG RPC agent Summary: This is needed to avoid FaultyPG from including and depending on RequestCallbackImpl, which is Python-only. The other RPC agents accept an explicit (upcast) pointer as an argument, and we can do the same for FaultyPG. Test Plan: Later in the stack. Reviewed By: mrshenli Differential Revision: D29132955 fbshipit-source-id: bb7554b84bcbf39750af637e6480515ac8b92b86	2021-06-15 16:19:50 -07:00
Hangchen Yu	f232b052a6	[fx-acc][easy] Format FX experimental partitioner code (#60030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60030 As titled. Non-functional re-format. Test Plan: NA Reviewed By: gcatron Differential Revision: D29038449 fbshipit-source-id: a7c94eaab86850ef57b51ec66bfe8ea0e68d2dc8	2021-06-15 16:14:33 -07:00
Richard Barnes	50229b5250	Fix some typing issues (#59952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59952 Test Plan: Sandcastle Reviewed By: swolchok Differential Revision: D29083423 fbshipit-source-id: 7a13d6ba60808bcf88d809db194d0f873605172c	2021-06-15 14:11:06 -07:00
Richard Barnes	1d5a577f04	Fix some items identified as problematic by Wextra and other clean-up (#59909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59909 Test Plan: Sandcastle Reviewed By: vkuzo Differential Revision: D29073150 fbshipit-source-id: 500a92ccb57b0e40277863a3b235099fd66ab8ad	2021-06-15 13:42:32 -07:00
Zafar Takhirov	dc1f60a9a2	[sparsity][refactor] Restructure the tests folders (#60032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60032 There will be more sparse tests coming. This PR creates a separate folder for the sparse tests Test Plan: `python test/test_ao.py` Reviewed By: raghuramank100 Differential Revision: D29139265 fbshipit-source-id: d0db915f00e6bc8d89a5651f08f72e362a912a6b	2021-06-15 13:37:19 -07:00
Ailing Zhang	8dd0570b34	Reuse build_torch_xla from pytorch/xla repo. (#59989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59989 Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29138211 Pulled By: ailzhang fbshipit-source-id: 349d307c510e7fad266822e320f0d6904fa00239	2021-06-15 13:19:54 -07:00
Richard Barnes	b162d95e46	Fix a number of lint perf and safety issues in torch (#59897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29037012 fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb	2021-06-15 13:14:51 -07:00
Ailing Zhang	a0e62c4da4	Reuse run_torch_xla_tests from pytorch/xla (#59888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59888 Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29114274 Pulled By: ailzhang fbshipit-source-id: d2845c7fc95d038cd68c10e22b68be8ad3cae736	2021-06-15 13:00:09 -07:00
Nikolay Korovaiko	c23624351a	disable test_sparse_allreduce_basics (#60029 ) Summary: This test will be disabled due to intermittent failures in https://circleci.com/gh/pytorch/pytorch/14155828?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link as per https://hud.pytorch.org/build2/pytorch-master Pull Request resolved: https://github.com/pytorch/pytorch/pull/60029 Reviewed By: seemethere Differential Revision: D29139042 Pulled By: Krovatkin fbshipit-source-id: 105000e8636f17846be31f517abdf56ea0a994e9	2021-06-15 12:35:11 -07:00
BowenBao	044b519a80	Symbolic for ReLu6 (#58560 ) (#59538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59538 Four mealv2 models can export in torch 1.8.1, but fails when torch master introduces relu6 a few months back. Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046607 Pulled By: SplitInfinity fbshipit-source-id: d9cf7050e4ac0dad892441305ffebc19ba84e2be Co-authored-by: David <jiafa@microsoft.com>	2021-06-15 12:24:17 -07:00
BowenBao	5d00c374dd	[ONNX] Sum empty tensor could not be exported to ONNX successfully. (#58141 ) (#59537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59537 PyTorch sum over empty tensor gives 0, while ONNX produces an error. torch.sum will be translated into onnx::ReduceSum op. Per the definition of ReduceSum, update the keepdims attribute for this scenario. Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046604 Pulled By: SplitInfinity fbshipit-source-id: 6f5f3a66cb8eda8b5114b8474dda6fcdbae73469 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-06-15 12:24:16 -07:00
BowenBao	83450aa11d	[ONNX] Add support for torch.bernoulli() export (#57003 ) (#59536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59536 Support export HuggingFace - Training DeBERTa model. Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046609 Pulled By: SplitInfinity fbshipit-source-id: df87e0c6ed0f13463297bdeba73967fcf2aa37ca Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-06-15 12:24:14 -07:00
BowenBao	cd5f142af4	fix error message for type_as (#57948 ) (#59535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59535 Improve error message for type_as and add unit test. Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046605 Pulled By: SplitInfinity fbshipit-source-id: 978bceeb62e4d3c68815cd5fdf160909a99d00f2 Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-06-15 12:24:12 -07:00
BowenBao	55530e2276	Update Autograd Export Docs (#56594 ) (#59534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59534 Update autograd export docs Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046606 Pulled By: SplitInfinity fbshipit-source-id: 36057f6bdfd3e5c071dbca05d327de7952904120 Co-authored-by: neginraoof <neginmr@utexas.edu>	2021-06-15 12:23:00 -07:00
Jiong Gu	a120a12ab4	[Bootcamp][pytorch]Add WebIterDataPipe and ToBytesIterDataPipe to the datapipes. (#59816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59816 Add two new DataPipes, one for getting web file urls to yield streams and one for getting streams to yield bytes. Test Plan: Add test_web_iterable_datapipe in test/test_datapipes.py. The test initiates a local http server for serving test files. Test below locally ok. 1. create and load 16M localhost file urls (each of size 10 Bytes) 2. create and load a 64GB localhost file in the unit test, for sake of testing time, disabling both stress test and large file test Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29051186 fbshipit-source-id: f8e44491e670560bf445af96f94d98230436f396	2021-06-15 11:43:26 -07:00
Scott Wolchok	79d7c15dc5	[PyTorch] Add ExclusivelyOwned (#59419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59419 This introduces ExclusivelyOwned, which allows isolated pieces of code that can make ownership guarantees to opt out of reference counting operations on `intrusive_ptr` and `Tensor` entirely. To elaborate, if you know you are the exclusive owner of an `intrusive_ptr` or `Tensor`, moving it into an `ExclusivelyOwned` will avoid performing atomic reference counting operations at destruction time. The documentation comment should provide sufficient explanation; please request changes if not. ghstack-source-id: 131376658 Test Plan: Added `ExclusivelyOwned_test.cpp`. It passes. When I ran it under valgrind, valgrind reported no leaks. Inspected assembly from `inspect` functions in `ExclusivelyOwned_test.cpp` in an optimized (opt-clang) build. As expected, `ExclusivelyOwned` calls `release_resources()` and the `TensorImpl` virtual destructor without including any atomic reference counting operations. Reviewed By: ezyang Differential Revision: D28885314 fbshipit-source-id: 20bf6c82b0966aaa635ab0233974781ed15f93c1	2021-06-15 11:26:25 -07:00
Pritam Damania	d7eb5836bb	Add RRef support to ShardedTensor. (#59776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59776 Overall design: https://github.com/pytorch/pytorch/issues/55207. In this PR, I've added support to ShardedTensor such that it also creates RRefs pointing to the remote shards if the RPC framework is initialized. As a result, this provides more flexiblity for ShardedTensor such that users can use collectives with local shards or use the RPC framework to interact with remote shards. ghstack-source-id: 131381914 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29020844 fbshipit-source-id: acb308d0029a5e486c464d93189b5de1ba680c85	2021-06-15 10:49:31 -07:00
Raghavan Raman	20460b0c05	[nnc] Removed setBufferMap method from LoopNest (#59496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59496 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28915958 Pulled By: navahgar fbshipit-source-id: 71e649c93fc67b36c37373f043c729aa835968a0	2021-06-15 10:37:48 -07:00
Raghavan Raman	b822928e33	[nnc] Removed setGPUBlockIndex and setGPUThreadIndex methods from LoopNest (#59495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59495 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28915960 Pulled By: navahgar fbshipit-source-id: 20a4032b031aba6e43d85433ade5f0680c65fbc0	2021-06-15 10:37:46 -07:00
Raghavan Raman	aa163aeff5	[nnc] Made several LoopNest APIs static (#59494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59494 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28915959 Pulled By: navahgar fbshipit-source-id: bf52e30d893f4d86812219b538a14307f347f10b	2021-06-15 10:36:31 -07:00
Eli Uriegas	4afd0b7952	.github: Add Windows CUDA 11.1 workflow (#59960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59960 Adds the CUDA 11.1 workflow to GHA Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29116814 Pulled By: seemethere fbshipit-source-id: 90601610e481e1f70a60eaa1b640373ecb89bdb9	2021-06-15 10:22:30 -07:00
Sam Estep	1c502d1f8e	Don't run_build when run_binary_tests (#59982 ) Summary: https://github.com/pytorch/pytorch/issues/59889 wasn't a proper revert of https://github.com/pytorch/pytorch/issues/58778. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59982 Reviewed By: seemethere Differential Revision: D29114129 Pulled By: samestep fbshipit-source-id: b40563db6ff1153a5f759639978279f5fcbccaa9	2021-06-15 07:39:38 -07:00
nikithamalgi	90cf76dde5	Support torch.nn.parameter type for PDT (#59249 ) Summary: ========= Support torch.nn.parameter type for PDT Pull Request resolved: https://github.com/pytorch/pytorch/pull/59249 Test Plan: ==== with-proxy python test/test_jit.py -k TestPDT Reviewed By: ZolotukhinM Differential Revision: D29124413 Pulled By: nikithamalgifb fbshipit-source-id: b486b82c897dbc2b55fbacd5d610bdb700ddc9fa	2021-06-15 07:22:33 -07:00
Serhat Yilmaz	f9445c8a6b	[torch][segment_reduce] Add cuda support for mean reduction (#59543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59543 Building on top of previous PR: https://github.com/pytorch/pytorch/pull/59521 This diff is adding support for mean reduction for Cuda (fwd only currently). Will add cuda backward implementation in subsequent PR. Next Steps: cuda backward support for mean 2d data input support more testing benchmarking Test Plan: update unit test to cover this part as well. Reviewed By: ngimel Differential Revision: D28922838 fbshipit-source-id: 72b7e5e79db967116b96ad010f290c9f057232d4	2021-06-15 07:00:45 -07:00
Luca Wehrstedt	f4f7950812	Prepare for TensorPipe separating its CUDA-specific headers (#59788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59788 This one line is all we need to "migrate" PyTorch to the "new API" of TensorPipe that splits the CUDA-specific stuff in a separate top-level header. (The motivation behind that is that it will allow us to "stack" the CUDA code on top of the CPU one). ghstack-source-id: 131326166 Test Plan: None yet Reviewed By: beauby Differential Revision: D28875277 fbshipit-source-id: ecfd0b7fc0218ab7899bfe64ffe73c1417b897db	2021-06-15 03:28:39 -07:00
Luca Wehrstedt	5e5ca0682b	Move CUDA-related stuff of TP agent to separate file (#59377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59377 This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. ghstack-source-id: 131326168 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28796429 fbshipit-source-id: 41b2eb8400c0da282f3750a4eea21ad83ee4a175	2021-06-15 03:28:38 -07:00
Luca Wehrstedt	83ba71aa0e	Make CUDA serde support for TP agent pluggable (#59376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59376 This is an experiment. The end goal is to separate the CUDA-specific aspects of the TensorPipe agent so that they can be plugged "on top" of the CPU-only parts. This will then allow to move the TP agent to libtorch (because libtorch is split into a CPU and a CUDA part; now it's in libtorch_python), although unfortunately other conditions need to also be met for this to happen. The only instance where we had CPU and CUDA logic within the same code, guarded by `#ifdef USE_CUDA`, is the serialization/deserialization code. I'm thus introducing a sort-of registry in order to "decentralize it". It's not a c10::Registry, because that's overkill (it uses an unordered_map, with strings as keys): here we can just use an array with integers as "keys". ghstack-source-id: 131326167 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28796428 fbshipit-source-id: b52df832e0c0abf489a9e418353103496382ea41	2021-06-15 03:27:40 -07:00
Martin Yuan	cf63893211	Enable implicit operator versioning via number of arguments (#58852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58852 Enable implicit operator versioning via number of arguments from Mobile. 1. By default, TS doesn't emit instructions for tailing default args and the provided number of specified args is serialized to bytecode. From interpreter the default values are fetched from operator schema. The implementation has been landed in #56845. Please refer to #56845 for details. 2. Since there is bytecode schema change, the bytecode version is bumped from 5 to 6. 3. The corresponding backport function is provided, for forward compatibility use. Note that because there is instruction change, a global flag is used as the switch to control the two versions. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D28789746 Pulled By: iseeyuan fbshipit-source-id: 6e5f16460c79b2bd3312de02d0f57b79f50bf66b	2021-06-15 02:07:40 -07:00
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Garret Catron	8d50a4e326	Add support for embeddingBagBytewise in FXGlow Summary: This adds support for embeddingBagBytewise with fp32 scale/bias to FXGlow. Test Plan: buck run //glow/fb/fx/fx_glow:test_fx_glow Reviewed By: jfix71 Differential Revision: D29075288 fbshipit-source-id: 4145486505a903129678216b133bbb8ad71f4fef	2021-06-14 23:31:29 -07:00
Hao Lu	cbd1e8c335	[Static Runtime] Fix bug in aten::to (#59995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59995 Reviewed By: ajyu Differential Revision: D29083106 fbshipit-source-id: 687ffb121af2716d606c145474942650a2d9ac7e	2021-06-14 22:54:43 -07:00
Denis Kokarev	087ac75b26	Fix quantized mean operator in QNNPACK backend (#59761 ) Summary: cc: kimishpatel Fixes https://github.com/pytorch/pytorch/issues/58668 Test it with `pytest -k test_quantized_mean test/test_quantization.py` or `buck test //caffe2/test:quantization -- test_quantized_mean` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59761 Reviewed By: bdhirsh Differential Revision: D29013271 Pulled By: kimishpatel fbshipit-source-id: 020956fb63bd5078856ca17b137be016d3fc29b8	2021-06-14 17:30:21 -07:00
Rong Rong (AI Infra)	5b9fced70a	add output_process_fn_grad before sum().backward() (#59971 ) Summary: This should fix `to_sparse` test issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59971 Test Plan: CI Also: directly examine the RuntimeError thrown from test_unsupported_backward - Before: ``` NotImplementedError: Could not run 'aten::sum' with arguments from the 'SparseCPU' backend. ``` - After: ``` to_dense() not supported for float16 on CPU ``` Reviewed By: soulitzer Differential Revision: D29112558 Pulled By: walterddr fbshipit-source-id: c2acd22cd18d5b34d25209b8415feb3ba28fa104	2021-06-14 16:20:03 -07:00
Jane Xu	117b7ae38a	Remove update-disabled-tests workflow as it is migrated to test-infra (#59986 ) Summary: Will be replaced by https://github.com/pytorch/test-infra/pull/37 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59986 Reviewed By: seemethere, soulitzer Differential Revision: D29115397 Pulled By: janeyx99 fbshipit-source-id: 2c1a88d6a3fec8cef57818a360884644ec2c7b79	2021-06-14 15:25:34 -07:00
Rohan Varma	c2098487e8	[c10d] Move pg wrapper tests to their own file. (#59840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840 moving these tests to their own standalone file. No meaningful code changes. ghstack-source-id: 131359162 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29012664 fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674	2021-06-14 15:05:55 -07:00
Alban Desmaison	5c1d17e697	Revert D29100708: [pytorch][PR] Parametrizations depending on several inputs Test Plan: revert-hammer Differential Revision: D29100708 (`061e71b199`) Original commit changeset: b9e91f439cf6 fbshipit-source-id: bff6d8a3d7b24f4beb976383912033c250d91a53	2021-06-14 14:08:50 -07:00
Shiyan Deng	5e993e6c81	[fx2trt] Make TRTInterpreter don't need concrete tensor as arg (#59948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59948 1. We have two Interpreters. One for vanilla op and one for acc op. Some of the logic between them are similar and in this diff we extract out the similar logic to a Base Interpreter. This makes any future general feature change could benefit both Interpreters. 2. Make TRT Interpreter not depending on concrete tensor arg. We will use `InputTensorSpec` to create necessary inputs for acc tracer. 3. Add unittests for acc op converter. Test Plan: ``` buck test mode/opt caffe2/torch/fb/fx2trt:test_linear buck test mode/opt caffe2/torch/fb/fx2trt:test_batchnorm buck test mode/opt caffe2/torch/fb/fx2trt:test_convolution buck test mode/opt caffe2/torch/fb/fx2trt:test_reshape buck test mode/opt caffe2/torch/fb/fx2trt:test_relu buck test mode/opt caffe2/torch/fb/fx2trt:test_add buck test mode/opt caffe2/torch/fb/fx2trt:test_maxpool ``` Reviewed By: jackm321 Differential Revision: D28749682 fbshipit-source-id: 830d845aede7203f6e56eb1c4e6776af197a0fc3	2021-06-14 14:03:26 -07:00
Joel Schlosser	c645d39a77	Implementation of torch.isin() (#53125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3025 ## Background This PR implements a function similar to numpy's [`isin()`](https://numpy.org/doc/stable/reference/generated/numpy.isin.html#numpy.isin). The op supports integral and floating point types on CPU and CUDA (+ half & bfloat16 for CUDA). Inputs can be one of: * (Tensor, Tensor) * (Tensor, Scalar) * (Scalar, Tensor) Internally, one of two algorithms is selected based on the number of elements vs. test elements. The heuristic for deciding which algorithm to use is taken from [numpy's implementation](`fb215c7696/numpy/lib/arraysetops.py (L575)`): if `len(test_elements) < 10 * len(elements) ** 0.145`, then a naive brute-force checking algorithm is used. Otherwise, a stablesort-based algorithm is used. I've done some preliminary benchmarking to verify this heuristic on a devgpu, and determined for a limited set of tests that a power value of `0.407` instead of `0.145` is a better inflection point. For now, the heuristic has been left to match numpy's, but input is welcome for the best way to select it or whether it should be left the same as numpy's. Tests are adapted from numpy's [isin and in1d tests](`7dcd29aaaf/numpy/lib/tests/test_arraysetops.py`). Note: my locally generated docs look terrible for some reason, so I'm not including the screenshot for them until I figure out why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53125 Test Plan: ``` python test/test_ops.py # Ex: python test/test_ops.py TestOpInfoCPU.test_supported_dtypes_isin_cpu_int32 python test/test_sort_and_select.py # Ex: python test/test_sort_and_select.py TestSortAndSelectCPU.test_isin_cpu_int32 ``` Reviewed By: soulitzer Differential Revision: D29101165 Pulled By: jbschlosser fbshipit-source-id: 2dcc38d497b1e843f73f332d837081e819454b4e	2021-06-14 13:50:53 -07:00
Emilio Castillo	f9ec86a6c6	External stream (#59527 ) Summary: Previous is https://github.com/pytorch/pytorch/issues/57781 We add now two CUDA bindings to avoid using ctypes to fix a windows issue. However, we use ctypes to allocate the stream and create its pointer (we can do this with a 0-dim tensor too if it feels better). CC. ezyang rgommers ngimel mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/59527 Reviewed By: albanD Differential Revision: D29053062 Pulled By: ezyang fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f	2021-06-14 13:46:11 -07:00
Meghan Lele	8e92a3a8b0	[docs] Add pickle security warning to package docs (#59959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59959 Summary This commit replaces the warning on the `torch.package` documentation page about the module not being publicly released (which will no longer be true as of 1.9) with one that warns about security issues caused by the use of the `pickle` module. Test Plan 1) Built the docs locally. 2) Continuous integration. <img width="877" alt="Captura de Pantalla 2021-06-14 a la(s) 11 22 05 a m" src="https://user-images.githubusercontent.com/4392003/121940300-c98cab00-cd02-11eb-99dc-08e29632079a.png"> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29108429 Pulled By: SplitInfinity fbshipit-source-id: 3a0aeac0dc804a31203bc5071efb1c5bd6ef9725	2021-06-14 13:03:05 -07:00
yanbing-j	ef13341a8d	upgrade onednn to v2.2.3 (#57928 ) Summary: This PR is to upgrade onednn to v2.2.3 (including v2.2 and v2.2.3 changes) which has the following main changes about CPU: v2.2 changes: Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake). Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support. Improved dnnl_gemm performance for cases with n=1 on all supported processors. v2.2.3 changes: Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support Fixed correctness issue for PReLU primitive on Intel Processor Graphics Fixed corretness issue in reorder for blocked layouts with zero padding Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support More changes can be found in https://github.com/oneapi-src/oneDNN/releases. Ideep used version is pytorch-rls-v2.2.3. OneDNN used version is v2.2.3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57928 Reviewed By: bdhirsh Differential Revision: D29037857 Pulled By: VitalyFedyunin fbshipit-source-id: db74534858bdcf5d6c7dcf58e224fc756188bc31	2021-06-14 11:57:45 -07:00
lezcano	061e71b199	Parametrizations depending on several inputs (#58488 ) Summary: Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are `torch.nn.utils.weight_norm` and low rank parametrizations via the multiplication of a `n x k` tensor by a `k x m` tensor with `k <= m, n`. Follows the plan outlined in https://github.com/pytorch/pytorch/pull/33344#issuecomment-768574924. A short summary of the idea is: we call `right_inverse` when registering a parametrization to generate the tensors that we are going to save. If `right_inverse` returns a sequence of tensors, then we save them as `original0`, `original1`... If it returns a `Tensor` or a sequence of length 1, we save it as `original`. We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one. There were a number of choices in the implementation: If the `right_inverse` returns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as: ```python class Sum(nn.Module): def forward(self, X, Y): return X + Y def right_inverse(Z): return Z, torch.zeros_like(Z) ``` rather than having to unpack manually a list or a tuple within the `forward` function. At the moment the errors are a bit all over the place. This is to avoid having to check some properties of `forward` and `right_inverse` when they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible. The invariants are the following: 1. The following code should be well-formed ```python X = module.weight Y = param.right_inverse(X) assert isinstance(Y, Tensor) or isinstance(Y, collections.Sequence) Z = param(Y) if isisntance(Y, Tensor) else param(*Y) ``` in other words, if `Y` is a `Sequence` of `Tensor`s (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parameters `param.forward` accepts. 2. Always: `X.dtype == Z.dtype and X.shape == Z.shape`. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor. 3. If it's one-to-one: `X.dtype == Y.dtype`. This is to be able to do `X.set_(Y)` so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuse `X` and the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation of `spectral_norm` and `weight_norm` does not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky. I'm still missing to go over the formatting of the documentation, I'll do that tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58488 Reviewed By: soulitzer Differential Revision: D29100708 Pulled By: albanD fbshipit-source-id: b9e91f439cf6b5b54d5fa210ec97c889efb9da38	2021-06-14 11:11:47 -07:00
Jason Ansel	ab70e1e984	[TensorExpr] Add error checking in mem_arena (#59922 ) Summary: Gives an error message (rather than a segfault) if you forget `KernelScope()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59922 Reviewed By: bertmaher Differential Revision: D29091303 Pulled By: jansel fbshipit-source-id: a24ee2385cae1f210b0cbc3f8860948fc052b655	2021-06-14 10:37:32 -07:00
jiej	9ad0de3c6f	Rework requires_grad on DifferentiableGraphOp (#57575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57575 This PR does two things: 1. reverts "Manual revert of D27369251 (`f88a3fff65`) (#56080)" in commit 92a09fb87a567100122b872613344d3a422abc9f. 2. fixing DifferentiableGraph output with wrong requires_grad flag Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is retrieved from profiling information. We previously only retrieves the profiling information on the first profile node in all its uses. However, in case where control flows are present, we need to iteratively search for profile node with profiling information available, in case the first use is in an inactive code path. e.g. ``` graph(%0 : Tensor, %1 : Bool): ..., %2 : Tensor = prim::DifferentiableGraph_0(%0) %3 : Tensor = prim::If(%1) block0(): %4 : Tensor = prim::DifferentiableGraph_1(%2) -> (%4) block1(): %5 : Tensor = prim::DifferentiableGraph_2(%2) -> (%5) -> (%3) with prim::DifferentiableGraph_0 = graph(%0 : Tensor): ... %out : Tensor = aten::operation(...) ... return (..., %out) with prim::DifferentiableGraph_1 = graph(%0 : Tensor): %temp : Tensor = prim::profile[profiled_type=Tensor](%0) ... with prim::DifferentiableGraph_2 = graph(%0 : Tensor): %temp : Tensor = prim::profile[profiled_type=Float(...)](%0) ... ``` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29038773 Pulled By: Krovatkin fbshipit-source-id: 6c0a851119f6b8f2f1afae5c74532407aae238fe	2021-06-14 10:37:31 -07:00
jiej	1f7251df90	fixing DifferentiableGraphOp updating requires_grad on input tensor list; python test added to verify the test (#57574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57574 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29038774 Pulled By: Krovatkin fbshipit-source-id: cb342c1b04fa3713a8166b39213437bc9f2d8606	2021-06-14 10:36:26 -07:00
cyy	c50c77b444	remove unused variables (#59912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59912 Reviewed By: soulitzer Differential Revision: D29100518 Pulled By: albanD fbshipit-source-id: b86a4aa9050e4fa70a0872c1d8799e5953cd2bc8	2021-06-14 10:33:48 -07:00
Rohan Varma	580a20f33b	[reland] torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59918 Reland of https://github.com/pytorch/pytorch/pull/59684 ghstack-source-id: 131303057 Test Plan: ci Reviewed By: cbalioglu Differential Revision: D29081452 fbshipit-source-id: 419df79341f702e796f7adf5f1071a6cd1dcd8d1	2021-06-14 09:52:54 -07:00
Jason Ansel	3d90c82a5c	[TensorExpr] Python binding improvements (#59920 ) Summary: Some minor quality of life improvements for the NNC python bindings: - expose `call_raw()` - support passing integers to `call()` (for dynamic shapes) - implicit conversions to cleanup `[BufferArg(x) for x in [A, B, C]]` into just `[A, B, C]` - don't silently default to "ir_eval" for unknown mode (e.g. "LLVM") Pull Request resolved: https://github.com/pytorch/pytorch/pull/59920 Reviewed By: ZolotukhinM Differential Revision: D29090904 Pulled By: jansel fbshipit-source-id: 154ace82725ae2046cfe2e6eb324fd37f5d209a7	2021-06-14 09:31:40 -07:00
leslie-fang-intel	68d690ffbd	Vectorize the softmax calculation when not along the last dim (#59195 ) Summary: Currently, if we do softmax which are not along the last dim, the calculation will fall to a [scalar version](`d417a094f3/aten/src/ATen/native/SoftMax.cpp (L14-L64)`). And we find actually we have the chance to vectorize the calculation along the inner_size dim. Changes we made: - Use vectorized softmax_kernel instead of host_softmax when not along the last dim. Performance data on 28 cores' Intel 8280 CPU when the Input size is [32, 81, 15130] and do softmax along the second dim(81). - FP32 Baseline: 24.67 ms - FP32 optimized: 9.2 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/59195 Reviewed By: ailzhang Differential Revision: D28854796 Pulled By: cpuhrsch fbshipit-source-id: 18477acc3963754c59009b1794f080496ae16c3d	2021-06-14 07:54:11 -07:00
Edward Yang	d60d81b5a7	Make PyObject_FastGetAttrString accept const char* (#59758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59758 The underlying call to tp_getattr is const safe but CPython has not fixed it due to BC problems. No reason not to advertise the better type here though! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29017911 Pulled By: ezyang fbshipit-source-id: 8d55983fe6416c03eb69c6367bcc431c30000133	2021-06-14 07:24:16 -07:00
Edward Yang	700add0737	Fix expecttest accept on Python 3.8 and later (#59709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59709 Fixes #59705. Python 3.8 fixed tracebacks to report the beginning of the line that raised an error, rather than the end. This makes for a simpler implementation (no more string reversing) but need to actually implement. This wasn't caught by tests because we hard coded line numbers to do substitutions, so I also added a little smoketest to detect future changes to traceback line number behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28994919 Pulled By: ezyang fbshipit-source-id: 1fb0a782e17c55c13d668fabd04766d2b3811962	2021-06-14 07:23:12 -07:00
Kushashwa Ravi Shrimali	cf38b20c61	Alias for `digamma` as `psi` to `special` namespace (#59143 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59143 Reviewed By: jbschlosser Differential Revision: D28986909 Pulled By: mruberry fbshipit-source-id: bc8ff0375de968f3662b224689fa0a6b117f9c4e	2021-06-14 03:05:14 -07:00
Xiaomeng Yang	ff15d93b88	Improve numerical stability of GroupNorm (#54921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54921 Improve numerical stability of GroupNorm Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm" Reviewed By: ngimel Differential Revision: D27414438 fbshipit-source-id: 815517240ca5ea3e2beb77ced3bd862e9c83d445	2021-06-13 16:13:32 -07:00
Peter Bell	095cd6a0da	MemoryOverlap: Avoid has_storage calls (#59013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59013 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29040929 Pulled By: ngimel fbshipit-source-id: 69745e7abbaf523795a90f68cf01d3d94508210e	2021-06-13 12:31:22 -07:00
Michael Carilli	be038d8989	[CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 ) Summary: ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227. Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6). The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect. For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler. ---------------------------------- Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750. `2718a54032` is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833 Reviewed By: mruberry Differential Revision: D28942391 Pulled By: ngimel fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8	2021-06-13 12:09:56 -07:00
Mike Ruberry	92513038e8	Revert D28994140: [pytorch][PR] Implemented torch.cov Test Plan: revert-hammer Differential Revision: D28994140 (`23c232554b`) Original commit changeset: 1890166c0a9c fbshipit-source-id: 73dfe1b00464e38f004f99960cdeeb604ed4b20a	2021-06-13 02:33:37 -07:00
Victor Quach	0ceea7faf4	Refactor SavedVariable (#59836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59836 Preparing for #58500 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29069159 fbshipit-source-id: dd4d870c8ae10a4bd7f12be127e093f60fa072fa	2021-06-12 23:21:36 -07:00
albanD	d03ff1a17d	pre compute regex and match simple signature autograd codegen 15s -> 12s (#59852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59852 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29063814 Pulled By: albanD fbshipit-source-id: a751047526f8d58f4760ee6f9ae906675bed5d75	2021-06-12 06:58:36 -07:00
albanD	30a18fe318	refactor yaml loader import, no runtime change (#59850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59850 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063816 Pulled By: albanD fbshipit-source-id: ca3067443d8e6282c1077d3dafa3b4f330d43b28	2021-06-12 06:58:34 -07:00
albanD	c60d1ac9cf	Use C dumper if possible aten codegen 23s -> 13s (#59849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59849 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063815 Pulled By: albanD fbshipit-source-id: c4baa72594bd2fe50ac67f513916f2b2ccb7488c	2021-06-12 06:58:32 -07:00
albanD	504ec30109	avoid error string formatting aten codegen 28s -> 23s (#59848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59848 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063818 Pulled By: albanD fbshipit-source-id: c68734672eeacd212d7bd9bebe3d53aaa20c3c24	2021-06-12 06:58:31 -07:00
albanD	7143a6a189	Avoid unnecessary re-computation autograd codegen 21s -> 15s (#59847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59847 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063817 Pulled By: albanD fbshipit-source-id: 284c3e057029b7a67f43a1b034bb30863bd68c71	2021-06-12 06:57:19 -07:00
lezcano	1f6e39336f	Simplify parametrizations.SpectralNorm and improve its initialization (#59564 ) Summary: Implements a number of changes discussed with soulitzer offline. In particular: - Initialise `u`, `v` in `__init__` rather than in `_update_vectors` - Initialise `u`, `v` to some reasonable vectors by doing 15 power iterations at the start - Simplify the code of `_reshape_weight_to_matrix` (and make it faster) by using `flatten` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59564 Reviewed By: ailzhang Differential Revision: D29066238 Pulled By: soulitzer fbshipit-source-id: 6a58e39ddc7f2bf989ff44fb387ab408d4a1ce3d	2021-06-11 19:52:44 -07:00
Richard Barnes	10a3a3d363	Fix bad change in a CUDACachingAllocator loop (#59903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59903 D29034650 (`cf0c4ac258`) probably breaks something because it changes a `for` loop on ~Line 1200 from `[size,max)` to `[0,max)`. This fixes that Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29081688 fbshipit-source-id: 21f08e3f244fc02cf97d137b3cc80d4378d17185	2021-06-11 18:20:07 -07:00
Facebook Community Bot	e49f0f4ffd	Automated submodule update: FBGEMM (#59874 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ae8ad8fd04` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59874 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D29064980 fbshipit-source-id: 593f08361817fb771afcf2732f0f647d7c2c72c3	2021-06-11 17:50:40 -07:00
Rohan Varma	3529a48ebb	Revert D28981326: torch/lib/c10d: Use torch_check instead of throwing runtime_error Test Plan: revert-hammer Differential Revision: D28981326 (`6ea6075002`) Original commit changeset: 264a7f787ea8 fbshipit-source-id: 75625b76dfbd0cbaf59705d621ef9e2d1677c482	2021-06-11 17:17:10 -07:00
mingfeima	f3218568ad	optimize channels last for BatchNorm2d on CPU (#59286 ) Summary: replacement of https://github.com/pytorch/pytorch/issues/48919 optimize channels last performance for BatchNorm2 on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59286 Reviewed By: bdhirsh Differential Revision: D29008198 Pulled By: VitalyFedyunin fbshipit-source-id: 8a7d020bd6a42ab5c21ffe788b79a22f4ec82ac0	2021-06-11 16:30:16 -07:00
Supriya Rao	864d129bae	[quant][fx] Remove extra q-dq for weight bias in normalization ops (#59882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59882 Currently for normalization ops, the weight and bias arguments are treated as activationn inputs which require observers. This results in adding extra quant-dequant ops for the weight and bias inputs. This PR adds support to skip observing weight/bias inputs of norm operators, thus removing the redundant q-dq ops Quantized graph with F.layer_norm Before this PR ``` def forward(self, x): _input_scale_0 = self._input_scale_0 _input_zero_point_0 = self._input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8); x = _input_scale_0 = _input_zero_point_0 = None scale = self.scale _input_scale_1 = self._input_scale_1 _input_zero_point_1 = self._input_zero_point_1 quantize_per_tensor_1 = torch.quantize_per_tensor(scale, _input_scale_1, _input_zero_point_1, torch.quint8); scale = _input_scale_1 = _input_zero_point_1 = None bias = self.bias _input_scale_2 = self._input_scale_2 _input_zero_point_2 = self._input_zero_point_2 quantize_per_tensor_2 = torch.quantize_per_tensor(bias, _input_scale_2, _input_zero_point_2, torch.quint8); bias = _input_scale_2 = _input_zero_point_2 = None _scale_0 = self._scale_0 _zero_point_0 = self._zero_point_0 dequantize = quantize_per_tensor_1.dequantize(); quantize_per_tensor_1 = None dequantize_1 = quantize_per_tensor_2.dequantize(); quantize_per_tensor_2 = None layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = dequantize, bias = dequantize_1, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0); quantize_per_tensor = dequantize = dequantize_1 = _scale_0 = _zero_point_0 = None dequantize_2 = layer_norm.dequantize(); layer_norm = None return dequantize_2 ``` After ``` def forward(self, x): _input_scale_0 = self._input_scale_0 _input_zero_point_0 = self._input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8); x = _input_scale_0 = _input_zero_point_0 = None scale = self.scale bias = self.bias _scale_0 = self._scale_0 _zero_point_0 = self._zero_point_0 layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = scale, bias = bias, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0); quantize_per_tensor = scale = bias = _scale_0 = _zero_point_0 = None dequantize = layer_norm.dequantize(); layer_norm = None return dequantize ``` Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_norm_weight_bias Imported from OSS Reviewed By: HDCharles, ailzhang Differential Revision: D29068203 fbshipit-source-id: 24b5c38bbea5fd355d34522bfa654c9db18607da	2021-06-11 16:22:36 -07:00
Richard Barnes	60eb22e45e	Build an -Wextra around c10 (#59853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59853 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29016682 fbshipit-source-id: f6c5f32464d57dbd60b59b5f9e2234ef2c39f1c1	2021-06-11 16:12:21 -07:00
Rong Rong	e41bc31eb2	make --run-specified-test-case use --include (#59704 ) Summary: instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704 Reviewed By: janeyx99 Differential Revision: D29038425 Pulled By: walterddr fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919	2021-06-11 13:57:13 -07:00
Richard Barnes	cf0c4ac258	Fix some issues in CUDACachingAllocator (#59819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59819 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29034650 fbshipit-source-id: 7e9689fc1ae121432e9421fa4a9ae00f7f78caca	2021-06-11 13:15:27 -07:00
Raghavan Raman	b83ac0cc4e	[nnc] Added a check to vectorize only those loops that are normalized. (#59423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59423 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28886979 Pulled By: navahgar fbshipit-source-id: edfc61feaf5efe22d4f367ac718b83b3d0f47cb3	2021-06-11 12:03:34 -07:00
Raghavan Raman	30e24b2d2b	[nnc] Modified vectorize API to return bool (#59422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59422 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28886980 Pulled By: navahgar fbshipit-source-id: 58cc3ecd86564a312a132f8260d836b096505095	2021-06-11 12:02:19 -07:00
Sam Estep	a9e136a61e	Remove ci/no-build (#59889 ) Summary: This reverts https://github.com/pytorch/pytorch/issues/58778, since triggering our primary CircleCI workflow only via pytorch-probot has been causing more problems than it's worth. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59889 Reviewed By: walterddr, seemethere Differential Revision: D29070418 Pulled By: samestep fbshipit-source-id: 0b47121b190c2e9efa27f38000ca362e634876dc	2021-06-11 11:55:56 -07:00
Hui Guo	f4fdc49957	[NNC] Add python bindings for loopnest.compress_buffer (#59681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59681 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28981573 Pulled By: huiguoo fbshipit-source-id: 003d66df576903c71bf46c95851fe6ccbba76f29	2021-06-11 11:28:39 -07:00
Sam Estep	ee3025f734	Give clearer lint error messages (#59876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59876 Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29067747 Pulled By: samestep fbshipit-source-id: cce7195467b5f9286d55a9d0c1655b4f92d4fbaf	2021-06-11 11:25:42 -07:00
Rohan Varma	6ea6075002	torch/lib/c10d: Use torch_check instead of throwing runtime_error (#59684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59684 Same reasoning as in the below diff. ghstack-source-id: 131167212 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28981326 fbshipit-source-id: 264a7f787ea8be76f743a2eaca67ae1d3bd8073a	2021-06-11 11:16:58 -07:00
Rohan Varma	d433a55c94	Replace throw std::runtime_error with torch_check in torch/csrc/distributed (#59683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59683 Replaces usages of throw std::runtime_error("foo") with the better torch_check(false, "foo") which allows C++ stacktraces to show up when TORCH_SHOW_CPP_STACKTRACES=1. This will hopefully provide much better debugging information when debugging crashes/flaky tests. ghstack-source-id: 131167210 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28981327 fbshipit-source-id: 677f569e28600263cab18759eb1b282e0391aa7b	2021-06-11 11:15:49 -07:00
Nikita Shulga	9cdbddb3f7	Fix `Vectorize<float>::trunc` on ARM platform (#59858 ) Summary: Use `vrndq_f32`, which corresponds to `VRINTZ` instruction, which rounds floating point value towards zero, which matches `std::trunc` behaviour. This makes trunc implementation correct even for values that fit into float32, but can not be converted to int32, for example `-1.0e+20`, see the following [gist](https://gist.github.com/malfet/c612c9f4b3b5681ca1b2a69930825871): ``` inp= 3.1 2.7 -2.9 -1e+20 old_trunc= 3 2 -2 -2.14748e+09 new_trunc= 3 2 -2 -1e+20 ``` Fixes `test_reference_numerics_hard_trunc_cpu_float32` on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59858 Reviewed By: kimishpatel Differential Revision: D29052008 Pulled By: malfet fbshipit-source-id: 6b567f39151538be1aa3890e3b4e1e978e598657	2021-06-11 10:55:45 -07:00
Kimish Patel	2ce21b2e61	[Pytorch backend delegation] Preprocess to accept (#58873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58873 BackenDebugInforRecorder Prior to this PR: In order to generate debug handles corresponding to the graph being lowered, backend's preprocess will call generate_debug_handles and will get map of Node*-to-debug_handles. In order to facilitate this, to_backend will own BackendDebugInfoRecorder and initialize thread local pointer to it. generate_debug_handle function will query thread local pointer to see if there is a valid BackendDebugInforRecorder for the context. If there is it will generate debug handles. After this PR: Signature of preprocess is changed such that backends have to register preprocess that accepts instance of BackendDebugInfoRecorder by reference. generate_debug_handles is no more a free function but becomes part of the API of BackendDebugInfoRecorder. Now backend's preprocess function will call generate_debug_handles on BackendDebugInfoRecorder instead of free function. Reason for this change: With RAII that initializes thread local pointer, results in a lose contract with backends, which may result in backends not storing debug information. Making it part of API results in backends having to be aware of BackendDebugInfoRecorder and explicitly chosing not to generate/store debug information if they chose to do so. Test Plan: backend tests Imported from OSS Reviewed By: jbschlosser, raziel Differential Revision: D28648613 fbshipit-source-id: c9b7e7bf0f78e87023ea7bc08612cf893b08cb98	2021-06-11 10:16:00 -07:00
Heitor Schueroff	23c232554b	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi TODO - [x] Improve documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: mruberry Differential Revision: D28994140 Pulled By: heitorschueroff fbshipit-source-id: 1890166c0a9c01e0a536acd91571cd704d632f44	2021-06-11 09:40:50 -07:00
Jane Xu	ba09355b12	Upgrade Windows CI Python to 3.8 (#59729 ) Summary: Python 3.6 EOL is end of this year--we should use newer Python in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59729 Reviewed By: bdhirsh Differential Revision: D29006807 Pulled By: janeyx99 fbshipit-source-id: c79214b02a72656058ba5d199141f8838212b3b6	2021-06-11 09:09:24 -07:00
Vasiliy Kuznetsov	d75e99b709	fx quant: enable qconfig_dict to target function invocations by order (#59605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59605 Enables targeting of individual function invocations by execution order. For example, given a module such as ``` class M1(torch.nn.Module): def forward(self, x): x = torch.add(x, x) x = torch.add(x, x) return x class M2(torch.nn.Module): def __init__(self): self.m1 = M1() def forward(self, x): x = self.m1(x) return x ``` We can now target the first add of `m1` with ``` qconfig_dict = { "module_name_function_order": ("m1", torch.add, 0, custom_qconfig), } ``` Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_qconfig_module_name_function_order ``` Imported from OSS Reviewed By: hx89 Differential Revision: D28951077 fbshipit-source-id: 311d423724a31193d4fa4bbf3a712b46464b5a29	2021-06-11 08:53:40 -07:00
albanD	e6110d4d5d	Fix input_buffer check if inplace update is valid (#59817 ) Summary: Fixes an issue introduced in https://github.com/pytorch/pytorch/issues/17182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59817 Reviewed By: bdhirsh Differential Revision: D29040738 Pulled By: albanD fbshipit-source-id: 67fd4e9fa0dadf507ddd954d20e119d8781c4de0	2021-06-11 07:29:03 -07:00
Luca Wehrstedt	c9e4d1372f	Add guards for USE_C10D_FOO in relevant c10d files (#59697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59697 The c10d build process selectively adds files based on the `USE_C10D_FOO` flags (where `FOO` is one of `GLOO`, `NCCL` or `MPI`). Replicating this logic inside libtorch will be harder, since libtorch uses a simpler approach (i.e., it lists the files in `build_variables.bzl`). So instead we could always include all files, and "disable" each file as needed using `#ifdef`s. Note that this is not a new approach: we already do the same for all the files of the TensorPipe agent based on the flag `USE_TENSORPIPE`. ghstack-source-id: 131169540 Test Plan: CI Reviewed By: agolynski Differential Revision: D28987577 fbshipit-source-id: 4c6195de4e9a58101dad9379537e8d055dfd38af	2021-06-11 05:06:42 -07:00
Luca Wehrstedt	773b56e719	Fix Windows guards in c10d (#59696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59696 Some files in c10d refer to dist autograd. However, on Windows, dist autograd isn't built. Hence we need to "mask out" those references under Windows. This was already partly done, but when moving c10d to libtorch some issues came up, possibly due to the different way in which linking happens. Hence I masked out the remaining references. ghstack-source-id: 131169541 Test Plan: CI Reviewed By: agolynski Differential Revision: D28987579 fbshipit-source-id: c29c5330f8429d699554972d30f99a89b2e3971d	2021-06-11 05:06:40 -07:00
Luca Wehrstedt	cbcae46fa5	Remove USE_CUDA from c10d reducer/logger (#59562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59562 Needed to merge c10d into libtorch(_cuda). ghstack-source-id: 131169542 Test Plan: CI Reviewed By: agolynski Differential Revision: D28931378 fbshipit-source-id: 71376b862ff6ef7dbfa7331ec8d269bd3fcc7e0d	2021-06-11 05:06:39 -07:00
Luca Wehrstedt	b4c35d7ae7	Remove USE_CUDA from ProcessGroupGloo (#59561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59561 Needed to merge c10d into libtorch(_cuda). ghstack-source-id: 131169544 Test Plan: CI Reviewed By: agolynski Differential Revision: D28931379 fbshipit-source-id: 9bd68477ae7bb870b6737a555edd5696149ff5d6	2021-06-11 05:05:31 -07:00
Bert Maher	b5e832111e	[nnc] Limit the number of inputs to a fusion group. Summary: nvrtc has a hard limit to the size of kernel parameters, and llvm has a tendency to OOM with huge parameter lists, so let's limit the number of inputs to something sensible. Test Plan: tested on pyper OOM test case: ``` flow-cli test-locally --mode=opt-split-dwarf f278102738 --name "PyPer OOM repro f277966799 f63b1f9c5c0c" --run-as-secure-group oncall_pytorch_jit --entitlement default ``` Reviewed By: ZolotukhinM Differential Revision: D29019751 fbshipit-source-id: b27f2bb5000e31a7b49ea86a6928faa0ae2ead24	2021-06-11 02:25:16 -07:00
Bert Maher	df759a3d9e	[nnc] Do not fuse matmul/conv2d if inputs are discontiguous. (#59754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59754 Also, if inputs are contiguous, use their Placeholders directly rather than generating contiguous Tensors from them. The rationale for this change is that aten::matmul and aten::conv2d support transposed inputs; if NNC generates a physical transpose to perform an external call, performance will be strictly worse than not fusing (sometimes dramatically so, as in the attached benchmark). Test Plan: benchmark Reviewed By: ZolotukhinM Differential Revision: D29010209 fbshipit-source-id: da6d71b155c83e8d6e306089042b6b0af8f80900	2021-06-11 02:23:47 -07:00
Gary Miguel	4b91355232	[ONNX] remove raw export type (#59160 ) Summary: [ONNX] remove raw export type Pull Request resolved: https://github.com/pytorch/pytorch/pull/59160 Reviewed By: tugsbayasgalan Differential Revision: D28937039 Pulled By: SplitInfinity fbshipit-source-id: 79bf91605526aa32a7304e75f50fe55d872bd4e8	2021-06-11 00:08:06 -07:00
Hao Lu	2112074f25	[Static Runtime] Add schema check to several aten ops (#59603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59603 D28698997 (`10345010f7`) was reverted because I forgot to replace the ``` VLOG(1) << "Found schema mismatch"; n->schema().dump(); ``` block in `aten::clamp_min` with `LogAndDumpSchema(n)` and that led to the bazel build to fail. I don't know why it makes the bazel build though. Test Plan: OSS CI. Reviewed By: ajyu Differential Revision: D28950177 fbshipit-source-id: 9bb1c6619e6b68415a3349f04933c2fcd24cc9a2	2021-06-10 23:39:00 -07:00
Eddie Yan	6eabbea47c	Disable cuDNN persistent RNN on A30 (#59830 ) Summary: https://github.com/pytorch/pytorch/issues/59829 cherry-picked from ptrblck 's change CC ngimel xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59830 Reviewed By: bdhirsh Differential Revision: D29046145 Pulled By: ngimel fbshipit-source-id: 270ab3bb6c1c7c759497a15eb38b20a177c94adb	2021-06-10 22:07:28 -07:00
Facebook Community Bot	455afdf974	Automated submodule update: FBGEMM (#59715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59715 This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `0520ad5f95` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59687 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jianyuh Differential Revision: D28986238 Pulled By: jspark1105 fbshipit-source-id: 12f68830b5b7a858fbc301af50593281852af51f	2021-06-10 21:53:30 -07:00
Lily Johnson	c7890b4a8e	[package] doc string cleanup extravaganza (#59843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59843 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D29049342 Pulled By: Lilyjjo fbshipit-source-id: 3330fb439f28dda0cafef5797ff61311f4afbf76	2021-06-10 21:21:48 -07:00
Nikita Shulga	54bfd41a2e	Fix torch.angle on aarch64 (#59832 ) Summary: angle should return 0 for positive values, pi for negative and keep nans in place, which can be accomplished using two blendv functions. Fixes number of unary test failures on M1/aarch64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59832 Reviewed By: kimishpatel Differential Revision: D29046402 Pulled By: malfet fbshipit-source-id: cb93ad2de140f7a54796387fc11053c507a1d4e9	2021-06-10 20:48:41 -07:00
Meghan Lele	4025f95a20	[docs] Add table of contents to torch.package docs (#59842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59842 Test Plan: Continuous integration. <img width="544" alt="Captura de Pantalla 2021-06-10 a la(s) 5 13 07 p m" src="https://user-images.githubusercontent.com/4392003/121612390-2ccec280-ca0f-11eb-87ad-fef632ba05ca.png"> Reviewed By: Lilyjjo Differential Revision: D29050627 Pulled By: SplitInfinity fbshipit-source-id: 76c25ed4002cbaf072036e2e14e7857c15077df7	2021-06-10 19:52:50 -07:00
Meghan Lele	0e222db087	[docs] Add explanation section to torch.package docs (#59833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59833 Summary This commit adds an explanation section to the `torch.package` documentation. This section clarifies and illuminates various aspects of the internals of `torch.package` that might be of interest to users. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050626 Pulled By: SplitInfinity fbshipit-source-id: 78e0cda00f69506ef2dfc52d6df63694b502269e	2021-06-10 19:52:48 -07:00
Meghan Lele	062dde7285	[docs] Add "how do I" section to torch.package docs (#59503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59503 Summary This commit adds a "how do I..." section to the `torch.package` documentation. This section contains short guides about how to solve real-world problems that frequently recur while using `torch.package`. Test Plan Continuous integration. <img width="877" alt="Captura de Pantalla 2021-06-04 a la(s) 9 19 54 p m" src="https://user-images.githubusercontent.com/4392003/120879911-98321380-c57b-11eb-8664-c582c92b7837.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050629 Pulled By: SplitInfinity fbshipit-source-id: 2b7800732e0a3c1c947f110c05562aed5174a87f	2021-06-10 19:52:47 -07:00
Meghan Lele	6a18ca7a07	[docs] Add tutorials section to torch.package docs (#59499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59499 Summary This commit adds a tutorials section to the torch.package docs. Test Plan Continuous integration. <img width="870" alt="Captura de Pantalla 2021-06-04 a la(s) 5 10 31 p m" src="https://user-images.githubusercontent.com/4392003/120874257-b9ced300-c55a-11eb-84dd-721cb7ac73ab.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050628 Pulled By: SplitInfinity fbshipit-source-id: c17ab0100a9d63e7af8da7a618143cedbd0a5872	2021-06-10 19:52:45 -07:00
Meghan Lele	a3db8e0a26	[docs] Add torch.package documentation preamble (#59491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59491 Summary This commit adds a preamble to the `torch.package` documentation page that explains briefly what `torch.package` is. Test Plan Continous integration. <img width="881" alt="Captura de Pantalla 2021-06-04 a la(s) 3 57 01 p m" src="https://user-images.githubusercontent.com/4392003/120872203-d535e000-c552-11eb-841d-b38df19bc992.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050630 Pulled By: SplitInfinity fbshipit-source-id: 70a3fd43f076751c6ea83be3ead291686c641158	2021-06-10 19:51:37 -07:00
albanD	a524ee00ca	Forward AD formulas batch 3 (#59711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59711 This is the exact same PR as before. This was reverted before the PR below was faulty. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28995762 Pulled By: albanD fbshipit-source-id: 65940ad93bced9b5f97106709d603d1cd7260812	2021-06-10 19:30:02 -07:00
albanD	8a7c0d082f	ger is an alias to outer, not the other way around (#59710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59710 This is the exact same PR as before. The version that landed was actually outdated compared to the github PR and that's why it failed on master... Sorry for the noise. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28995764 Pulled By: albanD fbshipit-source-id: 8f7ae3356a886d45787c5e6ca53a4e7b033e306e	2021-06-10 19:28:53 -07:00
Nikita Shulga	c2c35c0170	[Binary] Link whole CuDNN for CUDA-11.1 (#59802 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59802 Reviewed By: driazati, seemethere Differential Revision: D29033537 Pulled By: malfet fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1	2021-06-10 16:54:53 -07:00
Nils Plath	60ba451731	[torch] Remove using directive from header (#59728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59728 I noticed Sandcastle jobs failing with: ``` fbcode/caffe2/torch/csrc/api/include/torch/nn/modules/rnn.h:19:35: error: using namespace directive in global context in header [-Werror,-Wheader-hygiene] using namespace torch::nn::utils::rnn; ``` (cf. V3 of D28939167 or https://www.internalfb.com/intern/sandcastle/job/36028797455955174/). Removing `using namespace ...` fixes the problem. ~~... also applied code formatting ...~~ Test Plan: Sandcastle Reviewed By: jbschlosser Differential Revision: D29000888 fbshipit-source-id: 10917426828fc0c82b982da435ce891dc2bb6eec	2021-06-10 15:13:07 -07:00
Chen Lai	e9e9291dc1	[After fix] Reuse constant and bump bytecode to v5 (#59722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59722 Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29002345 Pulled By: cccclai fbshipit-source-id: d9c8e474ff57d0509580183206df038a24ad27e3	2021-06-10 15:03:16 -07:00
Serhat Yilmaz	ac6b5beade	[torch][segment_reduce] Add support for mean reduction (cpu) (#59521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59521 This diff is adding support for mean reduction for CPU (fwd + bckwd). Will add cuda implementation in subsequent PR. We are using "cub::DeviceSegmentedReduce" for other aggregation, trying to see how to support mean or will write custom kernel for it. Next Steps: - cuda support for mean - 2d data input support - more testing - benchmarking Test Plan: updated unit test. Still relying on manual data for ease of debugging. Will add more tests that covers edge cases once major features are complete. Reviewed By: ngimel Differential Revision: D28922547 fbshipit-source-id: 2fad53bbad2cce714808ff95759cbdbd45bb4ce6	2021-06-10 14:21:31 -07:00
Eli Uriegas	e71db0bb82	.jenkins: Ignore exit code of nvidia-smi (#59826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59826 It's only informational and will run on Windows CPU executors as well Fixes issues found in https://github.com/pytorch/pytorch/runs/2797531966 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29042951 Pulled By: seemethere fbshipit-source-id: 862094e53417c0a59d7728bf680be37b806b5a6f	2021-06-10 14:16:32 -07:00
Erjia Guan	e7ad82eb2f	[DataLoader] Add option to refine type during runtime validation for DP instance (#56066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56066 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27776646 Pulled By: ejguan fbshipit-source-id: 695ff7775177653d809c5917d938c706281e1298	2021-06-10 14:04:24 -07:00
Eli Uriegas	e2c784d940	[reland] .github: Add Windows GPU workflow (#58782 ) (#59752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59752 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29009775 Pulled By: seemethere fbshipit-source-id: 5be1b818b5653a4fdbfe4a79731317068dc1a5d1	2021-06-10 13:38:32 -07:00
Eli Uriegas	54cc477ea3	.github: Ensure cleaner windows workspace (#59742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59742 It looks like Windows workers were failing out due to some leftovers from previous builds, this should hopefully remedy some of those errors Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29009076 Pulled By: seemethere fbshipit-source-id: 426d54df14ec580cb24b818c48e2f4bd36159181	2021-06-10 13:37:22 -07:00
Vasiliy Kuznetsov	0099c25b85	fx quant: remove some dead code in observer insertion (redo) (#59799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799 This is a redo of #58574, easier to create a new PR than to fix rebase conflicts, as there have been a large number of refactors to the underlying code. Removes some code which was incorrectly added by #57519 but never actually used for anything. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29031955 fbshipit-source-id: f407d181070cb283382965952821e3647c705544	2021-06-10 12:57:09 -07:00
albanD	fb620a27d0	[WIP] Add slow gradcheck build for the ci/slow-gradcheck label (#59020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59020 Reviewed By: bdhirsh Differential Revision: D29036891 Pulled By: albanD fbshipit-source-id: b1f87b2cb38642097ad4079d1e818fa5997bedb4	2021-06-10 12:29:57 -07:00
Yi Zhang	cc32dcadd9	Fix Error when run python setup.py install again on Windows (#59689 ) Summary: Fix https://github.com/pytorch/pytorch/issues/59688 So far, .build.ninja should be removed before building the source code on Windows at any time Pull Request resolved: https://github.com/pytorch/pytorch/pull/59689 Reviewed By: bdhirsh Differential Revision: D29032960 Pulled By: walterddr fbshipit-source-id: 2b8162cd119820d3b6d8715745ec29b9c381e01f	2021-06-10 12:22:21 -07:00
Charles David Hernandez	1fc3576d97	Fixing and enabling tests that check fake_quant matches quant+dequant (#59095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59095 These tests were disabled, I'm unsure as to why. I've re-enabled them and remade them to expand testing to different devices and dtypes Test Plan: python test/test_quantization.py TestFakeQuantizeOps.test_numerical_consistency Imported from OSS Reviewed By: bdhirsh Differential Revision: D29018745 fbshipit-source-id: 28188f32bafd1f1704c00ba49d09ed719dd1aeb2	2021-06-10 12:16:54 -07:00
Kshiteej K	c90260905f	[fix] torch.{lin, log}space(): properly examine passed dtype (#53685 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53685 Reviewed By: jbschlosser Differential Revision: D28331863 Pulled By: anjali411 fbshipit-source-id: e89359b607d058158cfa1c9a82389d9a4a71185b	2021-06-10 11:59:54 -07:00
Jeffrey Wan	9bcef86d18	Split slow gradcheck periodic CI job so that it does not time out (#59736 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59736 Reviewed By: albanD Differential Revision: D29008100 Pulled By: soulitzer fbshipit-source-id: 76da971356fd985dfbfa56d3573f31ef04701773	2021-06-10 11:32:36 -07:00
Edvard Ghazaryan	f240624080	displays graph node's info (#59679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59679 Displays info about graph's nodes Test Plan: Expected view: %wide_offset.1 : Tensor = aten::add(%wide.1, %self._mu, %4) i0: Tensor CPUFloatType {32, 50} i1: Tensor CPUFloatType {1, 50} i2: int {1} o0: Tensor CPUFloatType {32, 50} %wide_normalized.1 : Tensor = aten::mul(%wide_offset.1, %self._sigma) i0: Tensor CPUFloatType {32, 50} i1: Tensor CPUFloatType {1, 50} o0: Tensor CPUFloatType {32, 50} %wide_preproc.1 : Tensor = aten::clamp(%wide_normalized.1, %5, %6) i0: Tensor CPUFloatType {32, 50} i1: double {0} i2: double {10} o0: Tensor CPUFloatType {32, 50} %user_emb_t.1 : Tensor = aten::transpose(%user_emb.1, %4, %7) i0: Tensor CPUFloatType {32, 1, 32} i1: int {1} i2: int {2} o0: Tensor CPUFloatType {32, 32, 1} %dp_unflatten.1 : Tensor = aten::bmm(%ad_emb_packed.1, %user_emb_t.1) i0: Tensor CPUFloatType {32, 1, 32} i1: Tensor CPUFloatType {32, 32, 1} o0: Tensor CPUFloatType {32, 1, 1} %31 : Tensor = static_runtime::flatten_copy(%dp_unflatten.1, %4, %8) i0: Tensor CPUFloatType {32, 1, 1} i1: int {1} i2: int {-1} o0: Tensor CPUFloatType {32, 1} %19 : Tensor[] = prim::ListConstruct(%31, %wide_preproc.1) i0: Tensor CPUFloatType {32, 1} i1: Tensor CPUFloatType {32, 50} o0: TensorList {2} %input.1 : Tensor = aten::cat(%19, %4) i0: TensorList {2} i1: int {1} o0: Tensor CPUFloatType {32, 51} %fc1.1 : Tensor = aten::addmm(%self._fc_b, %input.1, %29, %4, %4) i0: Tensor CPUFloatType {1} i1: Tensor CPUFloatType {32, 51} i2: Tensor CPUFloatType {51, 1} i3: int {1} i4: int {1} o0: Tensor CPUFloatType {32, 1} %23 : Tensor = aten::sigmoid(%fc1.1) i0: Tensor CPUFloatType {32, 1} o0: Tensor CPUFloatType {32, 1} %24 : (Tensor) = prim::TupleConstruct(%23) i0: Tensor CPUFloatType {32, 1} o0: Tuple {1} Reviewed By: hlu1 Differential Revision: D28592852 fbshipit-source-id: 09174014f7d0ce25c511025d2b376f14e16c8a4a	2021-06-10 10:33:30 -07:00
Jane Xu	7af9252ed7	[skip ci] export_slow_tests.py - Add option to ignore small differences (#59759 ) Summary: This would lower the number of unnecessary commits to pytorch/test-infra by only exporting a different stats file when the stats are varying enough. This way, if the slow test cases we gather from S3 are the same and their times are trivially different, then we do not bother exporting a different stats file when the --ignore-small-diffs option is enabled. We instead export the stats already in test-infra, so that when it tries to commit, it realizes it would be an empty commit and not add to the git history. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59759 Test Plan: Run `python tools/export_slow_tests.py --ignore-small-diffs <threshold>`. Reviewed By: walterddr Differential Revision: D29032712 Pulled By: janeyx99 fbshipit-source-id: 41d522a4c5f710e776acd1512d41be9791d0cf63	2021-06-10 09:44:33 -07:00
Nikita Shulga	51d954e8e4	Link ATEN tests with OpenMP runtime (#59733 ) Summary: Even if OpenMP extensions are supported by compiler, OpenMP runtime library is not always implicitly added as dependency by linker Above fixes linker problems on Apple M1, when libomp.dylib is installed via conda, when tests that directly use OpenMP pragams fail to link with following errors: ``` /Library/Developer/CommandLineTools/usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -Wno-deprecated-declarations -DUSE_PTHREADPOOL -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names -rdynamic caffe2/CMakeFiles/test_parallel.dir/__/aten/src/ATen/test/test_parallel.cpp.o -o bin/test_parallel -Wl,-rpath,/Users/nshulga/git/pytorch/build/lib lib/libgtest_main.a lib/libtorch.dylib lib/libtorch_cpu.dylib lib/libprotobuf.a lib/libc10.dylib lib/libgtest.a && : Undefined symbols for architecture arm64: "___kmpc_fork_call", referenced from: TestParallel_NestedParallel_Test::TestBody() in test_parallel.cpp.o TestParallel_Exceptions_Test::TestBody() in test_parallel.cpp.o "_omp_get_max_threads", referenced from: TestParallel_NestedParallel_Test::TestBody() in test_parallel.cpp.o TestParallel_Exceptions_Test::TestBody() in test_parallel.cpp.o "_omp_get_num_threads", referenced from: _.omp_outlined. in test_parallel.cpp.o _.omp_outlined..31 in test_parallel.cpp.o "_omp_get_thread_num", referenced from: _.omp_outlined. in test_parallel.cpp.o _.omp_outlined..31 in test_parallel.cpp.o "_omp_in_parallel", referenced from: TestParallel_NestedParallel_Test::TestBody() in test_parallel.cpp.o TestParallel_Exceptions_Test::TestBody() in test_parallel.cpp.o ld: symbol(s) not found for architecture arm64 clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59733 Reviewed By: walterddr, seemethere Differential Revision: D29005511 Pulled By: malfet fbshipit-source-id: daab5e1b0a58d9b60a8992ef40c743e4b619dac7	2021-06-10 09:41:24 -07:00
Kimish Patel	4f79270b89	[PyTorch ] Thread parallel bmm across batch dim (#59596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59596 Parallelize batch matmul across batch dim. This was found to improve perf for some usecases on mobile. ghstack-source-id: 130989569 Test Plan: CI unit tests Reviewed By: albanD Differential Revision: D26833417 fbshipit-source-id: 9b84d89d29883a6c9d992d993844dd31a25f76b1	2021-06-10 08:25:40 -07:00
Kimish Patel	3176f16691	[Pytorch benchmark] Add BMM benchmark (#59595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59595 ghstack-source-id: 130946743 Test Plan: bmm_test Reviewed By: mingzhe09088 Differential Revision: D28873228 fbshipit-source-id: 6e4cb04bb6c63f5f68d8f23c13738e2d58ab499c	2021-06-10 08:24:29 -07:00
Heitor Schueroff	58412740ae	Added doc for torch.einsum sublist format (#57038 ) Summary: Adds documentation for the new sublist format for `torch.einsum` closes https://github.com/pytorch/pytorch/issues/21412 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57038 Reviewed By: mruberry Differential Revision: D28994431 Pulled By: heitorschueroff fbshipit-source-id: 3dfb154fe6e4c440ac67c2dd92727bb5ecfe289e	2021-06-10 08:01:56 -07:00
Luca Wehrstedt	5e3e504728	Update TensorPipe submodule (#59789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59789 The bot messed up in D28867855 (`96651458eb`) so I've got to do it manually. Test Plan: CI Reviewed By: beauby Differential Revision: D29027901 fbshipit-source-id: 9438e0cfbe932fbbd1e252ab57e2b1b23f9e44cf	2021-06-10 06:36:46 -07:00
Facebook Community Bot	96651458eb	Automated submodule update: tensorpipe (#59374 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `e942ea1513` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59374 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D28867855 fbshipit-source-id: e1325046003f5c546f02024ff4c427c91721cd7e	2021-06-10 04:41:02 -07:00
Oleg Khabinov	0d7d316dc1	[fx ir] Support lists and dicts in FX IR GraphDrawer (#58775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58775 Reviewed By: RoshanPAN Differential Revision: D28613939 fbshipit-source-id: 4164e2dd772b59240ea3907001fe4ebddb003060	2021-06-10 01:55:53 -07:00
Luca Wehrstedt	e7cccc23b9	Add query and synchronize to c10::Stream (#59560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59560 `at::cuda::CUDAStream` has the `query` and `synchronize` methods, but `c10::Stream` does not, and I couldn't find any generic way to accomplish this. Hence I added helpers to do this to the DeviceGuardImpl interface, and then defined these methods on `c10::Stream`. (I had to do it out-of-line to circumvent a circular dependency). ghstack-source-id: 130932249 Test Plan: CI Reviewed By: ezyang Differential Revision: D28931377 fbshipit-source-id: cd0c19cf021e305d0c0cf9af364afb445d010248	2021-06-10 01:42:40 -07:00
Pritam Damania	f11120967e	Support EnumerableShardingSpec in ShardedTensor. (#59061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59061 Overall Design: https://github.com/pytorch/pytorch/issues/55207 This PR builds upon https://github.com/pytorch/pytorch/pull/58517 and https://github.com/pytorch/pytorch/pull/57409 to support creating a ShardedTensor using EnumerableShardingSpec. ghstack-source-id: 130780376 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28734551 fbshipit-source-id: 656f5f2b22041dae071bc475f19fe94c969716e8	2021-06-09 23:21:14 -07:00
Yi Wang	48ea7c808d	[C10d] Support subgroups (#59111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59111 Create a util function for initializing subgroups. By default, each subgroup contains all the ranks within a machine. This util function can be used by both local SGD and SyncBatchNorm optimization. Additionally, clang format `distributed/__init__.py` after importing `_rank_not_in_group` which is used by the unit test, and also clang format `distributed_c10d.py`. Note that this API does not accept another overall main group. Like APEX API `create_syncbn_process_group` [here](https://nvidia.github.io/apex/_modules/apex/parallel.html), always uses the global world size and should only be applied when CUDA is available. #Closes: https://github.com/pytorch/pytorch/issues/53962 ghstack-source-id: 130975027 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_group_size_exceeds_world_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_world_size_not_divisible_by_group_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_by_enumeration buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_by_enumeration_input_rank_exceeds_world_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_overlap_not_allowed Reviewed By: rohan-varma Differential Revision: D28495672 fbshipit-source-id: fdcc405411dd409634eb51806ee0a320d1ecd4e0	2021-06-09 22:35:11 -07:00
Rohan Varma	fc0582ee95	[c10d] Use TORCH_CHECK for monitored barrier error (#59667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59667 Use torch_check over throw std::runtime_error in monitored barrier so that it works with torch_cpp_show_stacktraces to reveal the entire callstack where the monitored barrier failed, which can help determine where the particular rank encountered an issue. ghstack-source-id: 130993689 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28974510 fbshipit-source-id: 6a6958995c1066cddcd647ca88c74473079b69fc	2021-06-09 22:31:33 -07:00
Martin Yuan	12b9e99e0d	Bump the bytecode reading version kMaxSupportedBytecodeVersion to 6 (#59714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59714 Bytecode v6 is on implicit operator versioning through number of specified arguments. Both the read and write codes are available. This PR is to enable reading v6 models. The default writing format is not changed yet and will be bumped in a later PR. Test: CI. Local: change the writing version to 6 temporally and run the unit tests in LiteInterpreterTest. There are a number of end-to-end tests to write v6 bytecode, read and run it. Test Plan: Imported from OSS Reviewed By: raziel, cccclai Differential Revision: D29007538 Pulled By: iseeyuan fbshipit-source-id: cb089d5d4c5b26c5b5cd3a5e0954e8c7c4c69aac	2021-06-09 21:58:31 -07:00
Tao Xu	3c6ae6f181	[OSS CI][iOS] Use LibTorch-Lite.h for nightly builds (#59762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59762 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D29018267 Pulled By: xta0 fbshipit-source-id: 10213a6811b4e6b33bd13355a7a7af85d21d48d4	2021-06-09 21:38:32 -07:00
Eli Uriegas	a62f6b6d04	ci: Add skipIfOnGHA util (#59748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59748 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29008217 Pulled By: seemethere fbshipit-source-id: ffc2f7935df722f26c1252e3833085430ada7433	2021-06-09 21:19:26 -07:00
Nikita Shulga	1ea5c19c19	Add USE_WHOLE_CUDNN option (#59744 ) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a	2021-06-09 21:12:42 -07:00
mingfeima	bb19dc14cc	add channels last support for AvgPool2d on CPU (#58725 ) Summary: replacement of: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/58725 Reviewed By: ngimel Differential Revision: D28593169 Pulled By: VitalyFedyunin fbshipit-source-id: 5de870fe1d9dd961fb0dab5f9d531ab14614a160	2021-06-09 21:06:45 -07:00
Natalia Gimelshein	52b2ed65c0	Revert D29007258: Revert D28926135: [pytorch][PR] Refactor Foreach Tests: Unary Functions Test Plan: revert-hammer Differential Revision: D29007258 Original commit changeset: c15f51661641 fbshipit-source-id: 98236153136a5c6b6c2911079b7bd214da6cb424	2021-06-09 21:02:56 -07:00
Nikita Shulga	827e00c914	Update Kineto to fix fd leak (#59755 ) Summary: Update to commit containing pytorch/kineto#281 Fixes https://github.com/pytorch/pytorch/issues/59746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59755 Reviewed By: seemethere, ngimel Differential Revision: D29011069 Pulled By: malfet fbshipit-source-id: 4c7b09ce5d497634f9927c330713c7404d892912	2021-06-09 20:47:04 -07:00
andwgu	a4e0368c99	Comment on tests reliance on ZeRO's partitioning algo (#59713 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/59548 Overview: Recently, we changed ZeRO's partitioning algorithm to first sort the parameters by decreasing size and then greedily allocate to shards. See [here](`ea1de87f4b`). The current tests `test_sharding()` and `test_add_param_group()` check for a uniform partitioning, which is not achieved with the old naive greedy partitioning algorithm for general world sizes but is achieved with the new sorted-greedy algorithm. This reliance is not ideal, but for now, we opt to simply add comments to document the dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59713 Test Plan: I tested for world sizes of 1, 2, 3, and 4 via the AI AWS cluster: ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=1 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_sharding srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=2 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_sharding srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=3 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_sharding srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_sharding srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=1 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_add_param_group srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=2 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_add_param_group srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=3 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_add_param_group srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_add_param_group ``` However, because the train queue (which offers instances with 8 GPUs) is not working at the moment, I was unable to test for world sizes of 5+. Nonetheless, I believe that they should still work. First, consider `test_sharding()`. Given the sorted-greedy algorithm, each shard will be assigned one of the parameters with size `9`, then one of the parameters with size `7`, then `5`, and finally `3`. Hence, each will have a uniform partition. Now, consider `test_add_param_group()`. Similarly, the same allocation behavior occurs, only the last shard is not assigned the final parameter with size `3` to begin. However, after adding the new `param_group` with the parameter with size `3`, a re-partitioning occurs. The first `param_group` is partitioned as before, and the parameter with size `3` in the new `param_group` is assigned to the last shard since it has the minimal total size. Thus, in the end, all shards have a uniform partition. Reviewed By: mrshenli Differential Revision: D28996460 Pulled By: andwgu fbshipit-source-id: 22bdc638d8569ed9a20836812eac046d628d6df2	2021-06-09 19:56:28 -07:00
James Donald	25179ecb63	[caffe2] Fix verbose templated signed/unsigned comparison warning (#59578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59578 This is verbose warning formed from one `CAFFE_ENFORCE_GT()` check: ``` third-party\toolchains\vs2017_15.9\buildtools\vc\tools\msvc\14.16.27023\include\xstddef(271): warning C4018: '>': signed/unsigned mismatch xplat\caffe2\c10\util\logging.h(208): note: see reference to function template instantiation 'bool std::greater<void>::operator ()<const T1&,const T2&>(_Ty1,_Ty2) const' being compiled with [ T1=int, T2=unsigned int, _Ty1=const int &, _Ty2=const unsigned int & ] xplat\caffe2\caffe2\operators\conv_pool_op_base.h(539): note: see reference to function template instantiation 'void c10::enforce_detail::enforceThatImpl<std::greater<void>,int,unsigned int,>(Pred,const T1 &,const T2 &,const char ,int,const char ,const void *)' being compiled with [ Pred=std::greater<void>, T1=int, T2=unsigned int ] xplat\caffe2\caffe2\operators\conv_pool_op_base.h(536): note: while compiling class template member function 'std::vector<caffe2::TensorShape,std::allocator<_Ty>> caffe2::ConvPoolOpBase<caffe2::CPUContext>::TensorInferenceForSchema(const caffe2::OperatorDef &,const std::vector<_Ty,std::allocator<_Ty>> &,int)' with [ _Ty=caffe2::TensorShape ] xplat\caffe2\caffe2\operators\conv_pool_op_base.h(631): note: see reference to function template instantiation 'std::vector<caffe2::TensorShape,std::allocator<_Ty>> caffe2::ConvPoolOpBase<caffe2::CPUContext>::TensorInferenceForSchema(const caffe2::OperatorDef &,const std::vector<_Ty,std::allocator<_Ty>> &,int)' being compiled with [ _Ty=caffe2::TensorShape ] xplat\caffe2\caffe2\operators\pool_op.cc(1053): note: see reference to class template instantiation 'caffe2::ConvPoolOpBase<caffe2::CPUContext>' being compiled xplat\caffe2\c10\core\memoryformat.h(63): note: see reference to class template instantiation 'c10::ArrayRef<int64_t>' being compiled ``` Use a signed `0` because `.dims_size()` returns a signed integer. Test Plan: Confirm warning no longer present in Windows build logs Reviewed By: simpkins Differential Revision: D28941905 fbshipit-source-id: acdc1281df2fe7f30b14cfad917cbbe8f3336d29	2021-06-09 19:48:29 -07:00
Zafar	b0fd3ca542	[sparse] Add the AO namespace to torch (#58703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58703 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28970962 Pulled By: z-a-f fbshipit-source-id: 0d0f62111a0883af4143a933292dfaaf8fae220d	2021-06-09 19:47:21 -07:00
Richard Barnes	3dfb94c17c	Construct a -Wall around Torch (#59668 ) Summary: Removes unused variables and functions and performs other minor mods sufficient to introduce `-Wall` as a default build flag. This should enhance code safety in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59668 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28974453 fbshipit-source-id: 011c720dd6e65fdbbd87aa90bf57d67bfef32216	2021-06-09 19:42:43 -07:00
Kevin Tse	fa030d1213	[DataPipes] Add simple unbatch to DataPipe (#59610 ) Summary: Implements the simple unbatch feature for DataPipe https://github.com/pytorch/pytorch/issues/58148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59610 Reviewed By: VitalyFedyunin Differential Revision: D28994180 Pulled By: NivekT fbshipit-source-id: 4bafe6e26c4f95a808c489b147369413a196fa1c	2021-06-09 16:53:31 -07:00
Rohan Varma	2f395f3b54	[reland] Document debugability features in torch.distributed (#59726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59726 Reland of https://github.com/pytorch/pytorch/pull/59604 with indentation fix ghstack-source-id: 130979356 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D29001923 fbshipit-source-id: 225d9dc5054c223b453f3b39749e2b62f61b9a2c	2021-06-09 16:40:11 -07:00
Kimish Patel	c5bee1ec4f	[PyTorch] Parallelize gelu via tensoriterator (#58950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58950 Use tensor iterator's API to set grain size in order to parallelize gelu op. ghstack-source-id: 130947174 Test Plan: test_gelu Reviewed By: ezyang Differential Revision: D28689819 fbshipit-source-id: 0a02066d47a4d9648323c5ec27d7e0e91f4c303a	2021-06-09 16:09:38 -07:00
Kimish Patel	8b63573c31	[PyTorch Operator Benchmark] gelu benchmark (#59334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59334 Add gelu op benchmark ghstack-source-id: 130947172 Test Plan: gelu_test Reviewed By: hl475 Differential Revision: D28842959 fbshipit-source-id: 93e23e027a488412488ecf22335d7d915f6cc3b4	2021-06-09 16:09:37 -07:00
Kimish Patel	874e7b889d	[PyTorch] Expose interface to set grain size on tensor iterator (#58949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58949 To parallelize ops grain size setting is exposed at for_each level. This is too far deep in the stack cpu_kernel_vec which does not know what the op is. You would want to parallelize op depending on the op type. Non trivial ops can benefit from threads even when the # of elements in tensor is not high. This API exposes setting grain size at tensor iterator level so that operator creating it can have control over it. ghstack-source-id: 130947175 Test Plan: CI + will add more test Reviewed By: ezyang Differential Revision: D26857523 fbshipit-source-id: 09fc2953061069967caa9c78b010cb1b68fcc6c9	2021-06-09 16:08:25 -07:00
Aliaksandr Ivanou	1735775662	[Torch] Cast timestamp type to int (#59712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59712 When worker process fails in fb due to signal failure, the TerminationHandler writes error reply file. Recently the error reply file was changed for mast jobs. The Json value of ``timestamp`` is string, even though in the thrift struct it is int: https://fburl.com/diffusion/upa228u5 This diff adds support for casting str timestamp to int. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing/errors:api_test Reviewed By: suphoff Differential Revision: D28995827 fbshipit-source-id: 333448cfb4d062dc7fe751ef5839e66bfcb3ba00	2021-06-09 15:56:37 -07:00
Can Balioglu	44c442293f	[torch/elastic] Fix the edge case when no node is alive (#59663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59663 This PR fixes an edge case bug in `DynamicRendezvousHandler` where the state of the rendezvous is not always entirely updated when one or more nodes are not alive anymore. Test Plan: Run the existing and newly-introduced unit tests. Reviewed By: tierex Differential Revision: D28971809 fbshipit-source-id: ebbb6a5f2b04f045c3732d6cf0f8fdc7c2381a7c	2021-06-09 15:31:50 -07:00
Victor Quach	0fa3db5594	Fix subgradient for element-wise max and min (#59669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59669 Fixes #56734 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28975531 fbshipit-source-id: 4e774dc8c6e095bc66962ce2411466de3880c2d3	2021-06-09 15:21:45 -07:00
Richard Barnes	e3d75b8475	irange for PyTorch sans jit (#59481 ) Summary: Switches most of the simple for loops outside of `jit` directories to use `c10::irange`. Generated with D28874212. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28909681 fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85	2021-06-09 14:46:11 -07:00
Alexander Grund	804f924504	Fix accuraccy failures when running test_nn on A100s (#59624 ) Summary: Make sure tests run explicitely without TF32 don't use TF32 operations Fixes https://github.com/pytorch/pytorch/issues/52278 After the tf32 accuracy tolerance was increased to 0.05 this is the only remaining change required to fix the above issue (for TestNN.test_Conv3d_1x1x1_no_bias_cuda) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59624 Reviewed By: heitorschueroff Differential Revision: D28996279 Pulled By: ngimel fbshipit-source-id: 7f1b165fd52cfa0898a89190055b7a4b0985573a	2021-06-09 14:38:34 -07:00
Neel Pragnesh Gandhi	47e286d024	Merge c10d elastic agent tests into local_elastic_agent_test.py file (#59657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59657 Introduce tests that test elastic agent with c10d and etc2-v2 rendezvous backends. Added a port allocation method that uses sockets to find an available port for the c10d backend. This way, agents that are created will all share the specified address/port and can communicate. Added a method that abstracts the backend to use when running a test. This way, any tests can quickly be switched to run on the backend of choice (c10d, etcd, or etcd-v2) Test Plan: Tests various components of the elastic agent with 3 different backends: etcd, etcd-v2, and c10d. Reviewed By: tierex Differential Revision: D28972604 fbshipit-source-id: fd4cff6417fefdf0de9d7a114820914b968006a8	2021-06-09 14:28:59 -07:00
Nikita Shulga	13a2025469	Delete empty caffe2/quantization/CMakeLists.txt (#59717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59717 Reviewed By: walterddr Differential Revision: D28997598 Pulled By: malfet fbshipit-source-id: ef2654577c0784254f3d74bc340cdabc76fa345c	2021-06-09 14:20:33 -07:00
Natalia Gimelshein	171142f9cc	Revert D28926135: [pytorch][PR] Refactor Foreach Tests: Unary Functions Test Plan: revert-hammer Differential Revision: D28926135 (`0897df18a3`) Original commit changeset: 4eb21dcebbff fbshipit-source-id: c15f51661641f455ae265cdf048051a3c01198f9	2021-06-09 14:05:56 -07:00
Jane Xu	9bb5663979	Use commit stats from viable/strict instead of nightlies for sharding (#59727 ) Summary: Currently, not all of CI runs on nightlies, so it's better to use viable/strict. For example, current 11.1 test jobs do not get to use automatic sharding because of the lack of stats: https://app.circleci.com/jobs/github/pytorch/pytorch/14010983?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/59727 Reviewed By: heitorschueroff Differential Revision: D29004910 Pulled By: janeyx99 fbshipit-source-id: eb0c54a7e7947decba8134a1d67e4b0434151a06	2021-06-09 13:52:15 -07:00
Nikita Shulga	8845cbabf0	[CMake] Split caffe2::cudnn into public and private (#59721 ) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336	2021-06-09 13:18:48 -07:00
epwalsh	c738c13304	Fix typo in checkpoint docs (#59646 ) Summary: This small typo causing this valuable piece of information to be excluded from the docs. <img width="876" alt="image" src="https://user-images.githubusercontent.com/8812459/121240517-47f2d400-c84f-11eb-9288-23c551c1591a.png"> The last "warning" is missing a second ":", so it doesn't render in the docs {emoji:1f447} <img width="875" alt="image" src="https://user-images.githubusercontent.com/8812459/121240467-39a4b800-c84f-11eb-9dd6-ec26754c43d3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59646 Reviewed By: mruberry Differential Revision: D28972541 Pulled By: jbschlosser fbshipit-source-id: d10c6688d8db4d4ec4b02858a4c7b352365219c0	2021-06-09 12:48:18 -07:00
Zhengxu Chen	51af772937	[jit] Set debug name for value coming out of GetAttr nodes. (#59123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59123 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28766023 fbshipit-source-id: 0919f4318fb5a7b1d5adc8f976dfc9309e233d13	2021-06-09 12:24:55 -07:00
Saketh Are	bbd58d5c32	fix :attr: rendering in F.kl_div (#59636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59636 Fixes #57538 Test Plan: Rebuilt docs to verify the fix: {F623235643} Reviewed By: zou3519 Differential Revision: D28964825 fbshipit-source-id: 275c7f70e69eda15a807e1774fd852d94bf02864	2021-06-09 12:20:14 -07:00
Eli Uriegas	e385be7611	.circleci: Disable pytorch_windows_test_multigpu (#59725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59725 These are failing on CircleCI with no apparent debug messages, see https://github.com/pytorch/pytorch/issues/59724 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29001353 Pulled By: seemethere fbshipit-source-id: ce0f4fbcfc7918824f6bad47b922d914eeb2f5a6	2021-06-09 12:12:13 -07:00
Yi Zhang	f8bb7e2f7c	Magma isn't needed in cpu build (#59619 ) Summary: Fix incorrect logic in windows CPU build script VERSION_SUFFIX shouldn't be cpu https://github.com/pytorch/pytorch/pull/59618/checks?check_run_id=2771591019 ![image](https://user-images.githubusercontent.com/16190118/121158840-3f18f700-c87d-11eb-9c03-277856afb1b2.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59619 Reviewed By: samestep Differential Revision: D29000213 Pulled By: seemethere fbshipit-source-id: fcc474967e281fbf9be69f14c0aedfd01820573f	2021-06-09 12:06:33 -07:00
andwgu	ed3884c3e9	Fix timeout with ZeRO test_step() and test_step_with_closure() (#59648 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/59548 Overview: This fixes the timeout issues with `test_step()` and `test_step_with_closure()` for the `ZeroRedundancyOptimizer`. The existing tests partially assumed a `world_size` of `2` (hence why [this](https://github.com/pytorch/pytorch/pull/59622) seems to be a temporary fix). This change instead should avoid baking in that assumption and allow `world_size` to be flexible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59648 Test Plan: I tested with 2, 3, and 4 GPUs (and hence `world_size`s of 2, 3, and 4, respectively) via the AI AWS cluster. ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=2 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=3 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=2 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step_with_closure srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=3 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step_with_closure srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py -- TestZeroRedundancyOptimizerDistributed.test_step_with_closure ``` Reviewed By: jbschlosser Differential Revision: D28975035 Pulled By: andwgu fbshipit-source-id: 2cbaf6a35e22a95e19fc97e1b64e585e452e774e	2021-06-09 12:03:05 -07:00
Kevin Zheng (FRL)	61965abad7	Move _PartialWrapper to module scope (#59660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660 Context https://github.com/pytorch/pytorch/issues/57352 Test Plan: Pytorch CI tests Reviewed By: vkuzo Differential Revision: D28972991 fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42	2021-06-09 11:55:04 -07:00
Eli Uriegas	0f6bd550a4	Revert D28981443: reland D28645531: .github: Add Windows GPU workflow Test Plan: revert-hammer Differential Revision: D28981443 (`21121675b3`) Original commit changeset: 5d24cccfb8c8 fbshipit-source-id: 14e5b610978882bace2f834f61e5457f62b11290	2021-06-09 11:43:10 -07:00
Rong Rong (AI Infra)	167477329d	[Reland] adding base commit to scribe report (#59677 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/59570. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59677 Reviewed By: janeyx99 Differential Revision: D28980356 Pulled By: walterddr fbshipit-source-id: 9c4671d18ce00fda98d774d1b2aa556662ecddfe	2021-06-09 11:06:01 -07:00
Yi Wang	d42e6c7f70	Clang format distributed_test.py (#59693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59693 ghstack-source-id: 130931133 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D28987619 fbshipit-source-id: 3681cc262b889653615ec64da8c23c96cc0d997b	2021-06-09 10:58:48 -07:00
Xiaodong Wang	68f74966fc	[ttk] Store float64 in tensorboard instead of float32 (#59435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59435 Sometimes we need to compare 10+ digits. Currenlty tensorboard only saves float32. Provide an option to save float64 Reviewed By: yuguo68 Differential Revision: D28856352 fbshipit-source-id: 05d12e6f79b6237b3497b376d6665c9c38e03cf7	2021-06-09 10:42:37 -07:00
Lily Johnson	3271853912	hold references to storages during TorchScript serializaiton (#59642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59642 Test Plan: Imported from OSS Reviewed By: jbschlosser, cccclai Differential Revision: D28968947 Pulled By: Lilyjjo fbshipit-source-id: 0046da8adb3a29fb108965a1d2201749fe2d0b41	2021-06-09 10:12:07 -07:00
Eli Uriegas	21121675b3	reland D28645531: .github: Add Windows GPU workflow (#59678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59678 This reverts commit 2956bbaf2388d424ef986c22fac8287f7c345978. Reland of https://github.com/pytorch/pytorch/pull/58782 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28981443 Pulled By: seemethere fbshipit-source-id: 5d24cccfb8c87832fa0233d0b524575dc04f8f05	2021-06-09 09:51:29 -07:00
Masaki Kozuki	0897df18a3	Refactor Foreach Tests: Unary Functions (#58960 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/58833 __changes__ - slowpath tests: pass every dtype&device tensors and compare the behavior with regular functions including inplace - check of #cudaLaunchKernel - rename `ForeachUnaryFuncInfo` -> `ForeachFuncInfo`: This change is mainly for the future binary/pointwise test refactors cc: ngimel ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/58960 Reviewed By: ejguan Differential Revision: D28926135 Pulled By: ngimel fbshipit-source-id: 4eb21dcebbffffaf79259e31961626e0707fb8d1	2021-06-09 09:45:16 -07:00
Sam Estep	62583e51a5	[reland] Add a ci/no-build label (#58778 ) Summary: Depends on https://github.com/pytorch/pytorch-probot/pull/22. Adds a new label called `ci/no-build` that disables the CircleCI `build` workflow on PRs. The current behavior should be the same in the absence of `ci/no-build`. Specifically, after this PR lands, for anyone who isn't rebased onto the latest `master`, I believe this will happen: - when they push to their PR, the CircleCI app triggers CI - the `pytorch-probot` app sees that their PR doesn't have the `ci/no-build` tag, so it also triggers CI - the latter should auto-cancel the former After checking with https://github.com/pytorch/pytorch/issues/59087, it looks like this would cause the "errored" number to go up and then go down as Circle jobs are canceled (saying "Your CircleCI tests were canceled") and then restarted: <img width="868" alt="Screen Shot 2021-05-27 at 12 39 20 PM" src="https://user-images.githubusercontent.com/8246041/119887123-9667b080-bee8-11eb-8acb-e1967899c9d5.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58778 Reviewed By: malfet Differential Revision: D28995335 Pulled By: samestep fbshipit-source-id: 8d7543b911e4bbbeef14639baf9d9108110b97c8	2021-06-09 09:05:44 -07:00
Sam Estep	b844fd11ee	Allow tools/test_history.py to be piped to head (#59676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59676 Test Plan: ``` tools/test_history.py --mode=columns --ref=3cf783 --test=test_set_dir --job pytorch_linux_xenial_py3_6_gcc5_4_test --job pytorch_linux_xenial_py3_6_gcc7_test \| head -n10 ``` Before this PR, the above command seems to just hang. After this PR, it nicely prints the following, line by line, and then exits: ``` 2021-02-10 12:18:50Z 3cf78395cbc32fa9c83b585c9ec63f960b32d17f 0.644s 0.312s 2021-02-10 11:13:34Z 594a66d778a660faed0b0fbbe1dd8c2c318707ff 0.360s errored 2021-02-10 10:13:25Z 9c0caf0384690cb67dcccb7066ece5184f72ca78 0.819s 0.449s 2021-02-10 10:09:14Z 602434bcbebb82c6f3741b2a3d5ebac7ee482268 0.361s 0.454s 2021-02-10 10:09:10Z 2e35fe953553247d8a22fc38b039374e426f13b8 2021-02-10 10:09:07Z ff73be7e45616fe106b9e5040bc091ca5cdbfc7f 2021-02-10 10:05:39Z 74082f0d6f8dfd60f28c0de0fe43bcb97b95ee5a 2021-02-10 07:42:29Z 0620c96fd6a140e68c49d68ed14721b1ee108ecc 0.414s 0.377s (2 job re-runs omitted) 2021-02-10 07:27:53Z 33afb5f19f4e427f099653139ae45b661b8bc596 0.381s 0.294s 2021-02-10 07:05:15Z 5f9fb93c1423814a20007faa506ceb8b4828c8d1 0.461s 0.361s ``` Reviewed By: seemethere Differential Revision: D28978017 Pulled By: samestep fbshipit-source-id: 021e634bbf40eb1d3b131fac574343dd5cef5deb	2021-06-09 08:42:05 -07:00
Rong Rong (AI Infra)	26beda8ed5	[BE] unsupported backward failing on single sample (#59455 ) Summary: Echo on https://github.com/pytorch/pytorch/pull/58260#discussion_r637467625 similar to `test_unsupported_dtype` which only check exception raised on the first sample. we should do similar things for unsupported_backward as well. The goal for both test is to remind developer to 1. add a new dtype to the support list if they are fulling runnable without failure (over all samples) 2. replace the skip mechanism which will indefinitely ignore tests without warning Pull Request resolved: https://github.com/pytorch/pytorch/pull/59455 Test Plan: CI. Reviewed By: mruberry Differential Revision: D28927169 Pulled By: walterddr fbshipit-source-id: 2993649fc17a925fa331e27c8ccdd9b24dd22c20	2021-06-09 08:17:03 -07:00
Kevin Tse	12b4e8996f	[DataLoader] Add nesting_level argument to map and filter (#59498 ) Summary: This PR implements the .map and .filter APIs for IterDataPipe. [DataPipes] Make .map of DataPipe sensitive to nested_level argument https://github.com/pytorch/pytorch/issues/58145 [DataPipes] Make .filter of DataPipe sensitive to nested_level argument https://github.com/pytorch/pytorch/issues/58147 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59498 Reviewed By: ejguan Differential Revision: D28964280 Pulled By: NivekT fbshipit-source-id: b1ee6cafa3953093ebd7bf30eacc80c3ef7cd190	2021-06-09 07:40:53 -07:00
caozhong	2693b0bef3	Fix compile error when debugging (#59616 ) Summary: Signed-off-by: caozhong <zhong.z.cao@intel.com> Triggered this probably because my full debug version python. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/59616 Reviewed By: jbschlosser Differential Revision: D28958685 Pulled By: albanD fbshipit-source-id: fdab622c4d1be93eb27e9006dcf3db7c5b44a04b	2021-06-09 06:34:06 -07:00
Luca Wehrstedt	f1786b293d	Revert D28972444: [pytorch][PR] Document debugability features in torch.distributed Test Plan: revert-hammer Differential Revision: D28972444 (`a9d2810817`) Original commit changeset: da5e8ee84f0d fbshipit-source-id: 94d3b3b75ddec74ea5b2b76f6a7519dc921ee2a7	2021-06-09 03:04:36 -07:00
Luca Wehrstedt	a56c89a160	Revert D28918331: [pytorch][PR] Automated submodule update: FBGEMM Test Plan: revert-hammer Differential Revision: D28918331 (`cc840cf544`) Original commit changeset: def60efe5584 fbshipit-source-id: 88101feb87ebfbd38cf10b45d09af309e9759852	2021-06-09 01:36:06 -07:00
Rohan Varma	a9d2810817	Document debugability features in torch.distributed (#59604 ) Summary: Adds comprehensive documentation around debugability features added to `torch.distributed` recently, including the `monitored_barrier` and TORCH_DISTRIBUTED_DEBUG env variable. ![dist_one](https://user-images.githubusercontent.com/8039770/121102672-0f052180-c7b3-11eb-974c-81dbbe102cb6.png) ![dist_two](https://user-images.githubusercontent.com/8039770/121102734-39ef7580-c7b3-11eb-94f7-c75469351440.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59604 Reviewed By: jbschlosser, SciPioneer Differential Revision: D28972444 Pulled By: rohan-varma fbshipit-source-id: da5e8ee84f0d6f252c703c4d70ff2a0d5817cc4e	2021-06-08 23:52:19 -07:00
Mikhail Zolotukhin	daa35141e8	Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508 An assert that was triggering in a previous version is now relaxed to take 0-dim tensors into account. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918342 Pulled By: ZolotukhinM fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae	2021-06-08 22:48:17 -07:00
Mikhail Zolotukhin	9f9904969f	Reland: "[TensorExpr] Fix printing of Bool dtype." (#59507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59507 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918344 Pulled By: ZolotukhinM fbshipit-source-id: b75aa9f316e4f3f648130a3171a35bfbbf1f397d	2021-06-08 22:48:16 -07:00
Mikhail Zolotukhin	0b6ec32004	Reland: "[TensorExpr] Improve debug messages." (#59506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59506 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918343 Pulled By: ZolotukhinM fbshipit-source-id: 168703f6368f5182cf9762600d7f0f6ea5b20280	2021-06-08 22:47:06 -07:00
Meghan Lele	04986b909f	[package] Add docstring for PackageExporter.intern (#59602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59602 Summary This commit adds a docstring for `PackageExporter.intern`. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28972939 Pulled By: SplitInfinity fbshipit-source-id: 1765541aa2ed88e01beb48c08b90f56df3a591b7	2021-06-08 19:53:36 -07:00
Jeffrey Wan	f52e202840	Add warning when accessing Tensor::grad() in the C++ API (#59362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35379 - Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`. - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?). - Python API now uses `retain_grad` implementation from cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362 Reviewed By: jbschlosser Differential Revision: D28969298 Pulled By: soulitzer fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02	2021-06-08 19:43:21 -07:00
Ivan Yashchuk	90303157ab	Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554 This PR enables complex numbers supports for matrix-matrix multiplication of COO sparse matrices. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968309 Pulled By: anjali411 fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca	2021-06-08 19:34:41 -07:00
Richard Barnes	b386ed6f9b	Fix some compiler warnings (#59643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59643 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28916206 fbshipit-source-id: 4f6c8e0faeb76848f5951ff85db7c9da7fe9bf54	2021-06-08 18:22:57 -07:00
James Reed	02d380450d	[FX][docs][EZ] Fix link to fuser example (#59670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59670 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D28975704 Pulled By: jamesr66a fbshipit-source-id: 2fb759224b5b1ecc62c0ab26563d2a35ed422794	2021-06-08 17:32:55 -07:00
Jeffrey Wan	1733d10399	Warn when backward() is called with create_graph=True (#59412 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4661 - Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths - Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412 Reviewed By: jbschlosser Differential Revision: D28969294 Pulled By: soulitzer fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0	2021-06-08 17:19:04 -07:00
Jane Xu	82466e0605	Revert D28900487: ger is an alias to outer, not the other way around Test Plan: revert-hammer Differential Revision: D28900487 (`4512d75063`) Original commit changeset: e9065c5b2907 fbshipit-source-id: 712c05d2fba28c83958ef760290e1e08c147a907	2021-06-08 17:09:15 -07:00
Facebook Community Bot	cc840cf544	Automated submodule update: FBGEMM (#59505 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `77a4792062` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59505 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D28918331 fbshipit-source-id: def60efe55843023e70b94726cde1faf6857be0b	2021-06-08 17:03:26 -07:00
Eli Uriegas	2956bbaf23	Revert D28645531: .github: Add Windows GPU workflow Test Plan: revert-hammer Differential Revision: D28645531 (`51884c6479`) Original commit changeset: 6ed1a2dead9c fbshipit-source-id: e082d7d50de77d0572596111e95a3da3a350a319	2021-06-08 16:59:56 -07:00
Jane Xu	97dfc7e300	[Reland] Adding run specified tests option to run_test.py (#59649 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/59487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649 Reviewed By: samestep Differential Revision: D28970751 Pulled By: janeyx99 fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5	2021-06-08 16:04:46 -07:00
Eli Uriegas	51884c6479	.github: Add Windows GPU workflow (#58782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58782 [skip ci] Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28645531 Pulled By: seemethere fbshipit-source-id: 6ed1a2dead9cca29e26e613afdbcf46ba7cee88c	2021-06-08 16:00:21 -07:00
Gisle Dankel	6104ac5aaf	[libkineto] Refactor trace activities (#59360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59360 Pull Request resolved: https://github.com/pytorch/kineto/pull/206 Replace ClientTraceActivity with GenericActivity. In addition: * Add a couple of new activity types for user annotations * Simplify code for GPU-side user annotations * Add accessor to containing trace span object in activities. Later we can replace this with a trace context / trace session object. * Simplified MemoryTraceLogger * Added early exit for cupti push/pop correlation ID Reviewed By: ilia-cher Differential Revision: D28231675 fbshipit-source-id: 7129f2493016efb4d3697094f24475e2c39e6e65	2021-06-08 15:49:35 -07:00
Ivan Yashchuk	acc47357b5	Fix torch.conj for zero-dimensional sparse coo matrix (#59553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553 Added a test for 0x0 sparse coo input for sparse_unary_ufuncs. This test fails for `conj` on master. Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't work with float16. Fixes https://github.com/pytorch/pytorch/issues/59549 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968215 Pulled By: anjali411 fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea	2021-06-08 15:46:49 -07:00
Rong Rong (AI Infra)	894aaa3997	Revert D28943928: [pytorch][PR] adding base commit to scribe report Test Plan: revert-hammer Differential Revision: D28943928 (`92ed70a048`) Original commit changeset: ae3d279005f5 fbshipit-source-id: fda98b6c54425bba2f937a1cb921027531d61842	2021-06-08 15:43:57 -07:00
Edward Yang	6ca141fe6c	Make detach return an alias even under inference mode (#59633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59633 Fixes #59614 This fix isn't 100% correct but it appears to stem the bleeding. A better fix would be understand how to detect when function implementations don't uphold required invariants, leading to refcount disaster. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28962183 Pulled By: ezyang fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094	2021-06-08 15:31:29 -07:00
Jane Xu	14f4c8d333	Revert D28387762: Forward AD formulas batch 3 Test Plan: revert-hammer Differential Revision: D28387762 (`58348bea06`) Original commit changeset: fc395c92af7e fbshipit-source-id: 608d704ff5bc560714790a576eaf9ed7f1f44e13	2021-06-08 15:19:26 -07:00
Riley Dulin	528d82d6a6	[torch] Add debug name to assert message for useOf Summary: Make an assert message in Pytorch's JIT provide better information by printing the debug name of a value in `PythonPrintImpl::useOf` if it's not found in any tables. Test Plan: Tested printing a `module.code` where the module had an invalid value used as an operand. Before it asserted without any more details, afterwards it printed the debug name which made it easy to track down the offending value. Reviewed By: SplitInfinity Differential Revision: D28856026 fbshipit-source-id: 479f66c458a0a2d9a161ade09f20382e7b19d60e	2021-06-08 15:03:58 -07:00
Natalia Gimelshein	9d533ef3ac	Renorm fix (#59615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a	2021-06-08 14:59:24 -07:00
Tao Xu	67b8e6410d	[OSS] Add podspec for libtorch-lite (#59638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59638 ghstack-source-id: 130847775 Test Plan: . Reviewed By: husthyc, cccclai Differential Revision: D28966693 fbshipit-source-id: 1b82623279709d0118c0967e2ba730d5dec040cc	2021-06-08 14:46:23 -07:00
Michael Melesse	1bb1a9e22b	[ROCm] enable test_cufft_plan_cache test (#57520 ) Summary: This pr enables the test_cufft_plan_cache in test_spectral suite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57520 Reviewed By: ejguan Differential Revision: D28936128 Pulled By: ngimel fbshipit-source-id: c843ab31c50855b624a986155c17c8d24e89a2ac	2021-06-08 14:42:01 -07:00
Howard Huang	43274ca145	test_store multiworker remove multiprocessing (#59599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59599 This will fix the flakiness for these tests internally when running under TSAN. We don't need multiprocessing since we should restrict the testing to the `wait_for_workers` and `world_size` parameters of the tcp store master store. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28947838 Pulled By: H-Huang fbshipit-source-id: d3e3904aa7ac81ae4c744a193a3b7167c2227bc8	2021-06-08 14:38:42 -07:00
Alexander Grund	40cbf342d3	Fix vectorized calculations on POWER (#59382 ) Summary: This fixes multiple bugs introduced by the VSX optimized code in https://github.com/pytorch/pytorch/pull/41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes https://github.com/pytorch/pytorch/issues/59248 Fixes https://github.com/pytorch/pytorch/issues/57537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0	2021-06-08 14:18:47 -07:00
Victor Quach	ea3b2fd0fa	Throw RunTimeError using TORCH_CHECK (#59485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59485 ... when varaible is not allowed to required grad Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28933808 fbshipit-source-id: ef3536049d3a4a2f6e2f4b1787f0c17763f5828c	2021-06-08 14:03:21 -07:00
Victor Quach	5fc105b323	Raise NotImplementedError on forward passes (#59483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59483 ... for functions that are not implemented Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28933806 fbshipit-source-id: dadae1af6609f15419cf0f47a98361dc87dff849	2021-06-08 14:03:19 -07:00
Victor Quach	c268eefe96	Use TORCH_CHECK_NOT_IMPLEMENTED for AD not implemented (#59482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59482 Fixes #53398 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28933809 fbshipit-source-id: 53387ec9690fc235b0622b50800feced706ea1ee	2021-06-08 14:02:04 -07:00
Yukio Siraichi	84061dadad	Add reduce variants for `scatter` operation. (#57015 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56463 #56464 - Add reduce variants for `scatter` in both _native_functions.yaml_ and _TensorAdvancedIndexing.cpp_ - Add `OpInfo` tests and reduce tests in _test_torch.py_ - Fix default reduce argument for `scatter_` in __tensor_docs.py_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/57015 Reviewed By: mrshenli Differential Revision: D28162657 Pulled By: ezyang fbshipit-source-id: 4d37ed1569ce8560aca1085c9cf5349f11427c4f	2021-06-08 13:37:26 -07:00
Jerry Zhang	9de0c214bd	[quant] Fix dimension for output of batchnorm 1d (#59264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59264 Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension for input Tensor is 2, this PR adds an extra squeeze Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D28810597 fbshipit-source-id: 879873bbf39ed3607762684694f6e81b423740c2	2021-06-08 13:07:00 -07:00
albanD	58348bea06	Forward AD formulas batch 3 (#58094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58094 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387762 Pulled By: albanD fbshipit-source-id: fc395c92af7ebb5ebae95c40f6c76273047f4097	2021-06-08 13:00:21 -07:00
albanD	4512d75063	ger is an alias to outer, not the other way around (#59448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59448 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28900487 Pulled By: albanD fbshipit-source-id: e9065c5b29078d92ea9b746e188ebc1e62a407a0	2021-06-08 12:59:06 -07:00
Jane Xu	d0e84c2f23	Revert D28961233: [pytorch][PR] Adding run-specified-test-cases option in run_test.py Test Plan: revert-hammer Differential Revision: D28961233 (`a6c9483c2f`) Original commit changeset: 6b7ddc6e6185 fbshipit-source-id: 4f8471df987a03d5c928a04f989d5d43f9cc47e9	2021-06-08 12:04:15 -07:00
Rong Rong (AI Infra)	0208e604e3	seems os.environ.get() not working well on windows (#59634 ) Summary: replace with os.getenv() instead For some reason this was intermittently failing azure pipelines. I can't login to the pipeline itself for debugging but here are 2 examples: [successful](https://app.circleci.com/pipelines/github/pytorch/pytorch/332405/workflows/944609ad-5dcf-49da-984f-26c381d1f16c/jobs/13969059) vs [failed](https://app.circleci.com/pipelines/github/pytorch/pytorch/332518/workflows/21f8a5a6-3b95-432e-be42-ac98008c671b/jobs/13975637) However given the fact that the other common_utils.py exposed constants using `os.getenv()` was working. I am making them consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59634 Test Plan: CI/master Reviewed By: jbschlosser Differential Revision: D28966412 Pulled By: walterddr fbshipit-source-id: 7bcb9adf06df0acabd9574459eb6637c3e6a2947	2021-06-08 11:59:39 -07:00
Your Name	1242dd1357	Remove cancel_redundant_workflows job (#59608 ) Summary: After https://github.com/pytorch/pytorch/issues/59019 this workflow itself is redundant, so we don't need it anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/59608 Reviewed By: jbschlosser, seemethere Differential Revision: D28952314 Pulled By: driazati fbshipit-source-id: 41aa33164be8271210ec23b9641e74596114416d	2021-06-08 11:38:29 -07:00
Yi Zhang	7949fdd2b6	ninja 1.9.0 couldn't be installed, CI might be broken (#59625 ) Summary: I suddenly find that `pip install ninja==1.9.0 ` failed in CI. And I tested locally and on another colleague's machine. It looks it conflicts with cmake installed in conda. https://app.circleci.com/pipelines/github/pytorch/pytorch/332470/workflows/d8b6ed30-1c7e-4863-898a-7f067c6202e1/jobs/13972409 ![image](https://user-images.githubusercontent.com/16190118/121175743-02a1c700-c88e-11eb-9596-97b903b727f9.png) 1.10.0 couldn't be installed either. ![image](https://user-images.githubusercontent.com/16190118/121176606-fbc78400-c88e-11eb-931c-aa65bad080f8.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59625 Reviewed By: jbschlosser Differential Revision: D28966699 Pulled By: seemethere fbshipit-source-id: a1150e411ba3b4ab65448a087aa65f4ebe6c3596	2021-06-08 11:07:14 -07:00
Aliaksandr Ivanou	13917bab7f	[Torch] Correct launcher tests (#59635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59635 The diff corrects launcher tests. The follow up would be to determine why the tests succeeded during the ``use_env`` diff removal Test Plan: buck test mode/dev-tsan //caffe2/test/distributed/launcher:run_test -- --exact 'caffe2/test/distributed/launcher:run_test - test_launch_user_script_python_caffe2_bc (run_test.ElasticLaunchTest)' --run-disabled Reviewed By: cbalioglu Differential Revision: D28963813 fbshipit-source-id: a9f9b80787fb5c2f40a69ce31c8c2f3138654cad	2021-06-08 11:05:57 -07:00
Wei Wen	3b0c6a7b50	fix AddPadding tensor shape inference (#59572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59572 fix AddPadding tensor shape inference Test Plan: sandcastle Reviewed By: dehuacheng Differential Revision: D28686983 fbshipit-source-id: 03f70335fcfd94a1241562f8fbf12043a0deac2b	2021-06-08 11:02:33 -07:00
Jerry Zhang	7dac2987ce	[quant][eager][fix] Fix a typo in convert function in eager mode quantization (#59571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59571 Test Plan: python test/test_quantization.py TestPostTrainingStatic.test_custom_module_class Imported from OSS Reviewed By: vkuzo Differential Revision: D28938355 fbshipit-source-id: 566daeb07d616ae40e52754d3d4581f75f248f04	2021-06-08 10:24:22 -07:00
Yi Wang	31d136c81f	[DDP] Rename the member divFactor_ as div_factor for naming consistency in reducer (#59523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59523 Should use snake case instead of camel case for the consistency. ghstack-source-id: 130759655 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: cbalioglu Differential Revision: D28922896 fbshipit-source-id: e04298284a78b2e71b562f790a878731962f873a	2021-06-08 10:04:20 -07:00
Yi Wang	b7ee164456	[DDP] Remove the duplicate parseHookResult in reducer (#59510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59510 Address the comment in https://github.com/pytorch/pytorch/pull/58937#discussion_r645822768 #Closes: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130758758 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D28918694 fbshipit-source-id: 7ac4e4e6268e220adefed230bdb377ab3b25e302	2021-06-08 10:04:18 -07:00
Yi Wang	2b398d0537	[Reland][Gradient Compression] Apply division first to avoid overflow (#59576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59576 If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce. This fix is applied to both C++ and Python comm hooks. ghstack-source-id: 130754510 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D28941327 fbshipit-source-id: 932e8ddbdb2bfd609a78943f6dc390d3d6ca333f	2021-06-08 10:03:21 -07:00
Rong Rong (AI Infra)	92ed70a048	adding base commit to scribe report (#59570 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59408 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59570 Test Plan: CI Reviewed By: samestep Differential Revision: D28943928 Pulled By: walterddr fbshipit-source-id: ae3d279005f54d83d7a3acae508d3ccdf1cd46b8	2021-06-08 09:58:38 -07:00
Jane Xu	a6c9483c2f	Adding run-specified-test-cases option in run_test.py (#59487 ) Summary: The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name. This PR also adds .json to some files we use for better clarity. Usage: `python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like: ``` test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750 test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729 test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825 test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825 test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805 test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487 Test Plan: Without specifying the option, everything should be as they were before. Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv Loading specified test cases to run from windows_smoke_tests.csv. Processed 28 test cases. Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781] s ---------------------------------------------------------------------- Ran 1 test in 0.000s OK (skipped=1) ... ``` With pytest, an example executable would be: `Running test_dataloader ... [2021-06-04 17:37:57.643039] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]` Reviewed By: jbschlosser Differential Revision: D28961233 Pulled By: janeyx99 fbshipit-source-id: 6b7ddc6e61856aa0002e1a0afc845770e4f8400b	2021-06-08 09:49:10 -07:00
Andrew Gu	ea1de87f4b	Sort params by size (decreasing) Summary: Pull Request: https://github.com/pytorch/pytorch/pull/59586 Task: https://www.internalfb.com/tasks/?t=90847711 Overview: Suppose we have `n` items with positive integer sizes and `k` buckets. We want to assign items to buckets with the goal of uniformity. The precise criteria for uniformity can vary: e.g. minimize the maximum size, maximize the minimum size, etc. This is known as [multiway number partitioning](https://en.wikipedia.org/wiki/Multiway_number_partitioning). ZeRO's partitioning task reduces to solving this problem. In particular, this is the subproblem to be solved for each `param_group` in `self.param_groups`, where the parameters are the items and the ranks give the buckets. The existing implementation uses the linear-time [greedy number partitioning algorithm](https://en.wikipedia.org/wiki/Greedy_number_partitioning#Linear-time_algorithm), which assigns the next tensor-parameter to the process with the smallest total parameter size so far. In this task, I explore the [extension](https://en.wikipedia.org/wiki/Greedy_number_partitioning#Improved_algorithm) where each parameter group is sorted by decreasing size before applying the greedy algorithm, requiring linearithmic time (as dominated by the sort). Experiments The mean number of parameters represents a perfectly uniform allocation and hence the ideal allocation (which may be even better than the optimal partition). In the following tables, I present the maximum number of parameters for any one process and the difference from the mean in parentheses for ResNet-50, ResNet-152, and BERT (the bare BERT model). The best-performing partitioning strategy for each model is bolded. Two processes: \| Model \| Max Num Params - Greedy (Diff) \| Max Num Params - Greedy-Sorted (Diff) \| Mean Num Params \| \| --- \| --- \| --- \| --- \| \| ResNet-50 \| 13,249,600 (471,084) \| 12,794,816 (16,300) \| 12,778,516 \| \| ResNet-152 \| 30,567,488 (471,084) \| 30,111,424 (15,020) \| 30,096,404 \| \| BERT \| 54,749,184 (8,064) \| 55,327,488 (586,368) \| 54,741,120 \| Four processes: \| Model \| Max Num Params - Greedy (Diff) \| Max Num Params - Greedy-Sorted (Diff) \| Mean Num Params \| \| --- \| --- \| --- \| --- \| \| ResNet-50 \| 7,524,864 (1,135,606) \| 6,436,864 (47,606) \| 6,389,258 \| \| ResNet-152 \| 16,232,192 (1,183,990) \| 15,090,152 (41,950) \| 15,048,202 \| \| BERT \| 28,151,040 (780,480) \| 28,352,256 (981,696) \| 27,370,560 \| --- I also investigated the latency of `optimizer.step()` for the different partitioning algorithms. I measured the latency for 30 iterations and took the mean latency per process (excluding the first iteration due to cache coldness). In the following tables, I present the maximum of those mean latencies over all processes and the standard deviation of the latencies contributing to that maximum. Again, the best-performing partitioning strategy for each model is bolded. All entries are presented in seconds and used `gloo` backend. Two processes: \| Model \| Max `optimizer.step()` Time - Greedy (Std.) \| Max `optimizer.step()` Time - Greedy-Sorted (Std.) \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.060 (0.002) \| 0.061 (0.002) \| \| ResNet-152 \| 0.166 (0.003) \| 0.160 (0.004) \| \| BERT \| 0.220 (0.009) \| 0.199 (0.006) \| Four processes: \| Model \| Max `optimizer.step()` Time - Greedy \| Max `optimizer.step()` Time - Greedy-Sorted \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.094 (0.004) \| 0.093 (0.004) \| \| ResNet-152 \| 0.228 (0.011) \| 0.231 (0.009) \| \| BERT \| 0.328 (0.015) \| 0.329 (0.021) \| Based on the standard deviations, the differences in the latency measurements across the different algorithms appear to be within the uncertainty in the measurement itself. Hence, it is difficult to argue that one algorithm is clearly the fastest. --- `zero.py` is my experiment script, and I use the AI AWS cluster. The run command looks like: ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python zero.py -b nccl greedy 2 4 ``` This runs the experiment script on an instance with 4 GPUs using `nccl` backend, outputting to a directory named `greedy/`, and using world sizes of 2 and 4. An analogous command can be used after modifying `partition_parameters()`, e.g. replacing `greedy` with `greedy_sorted` as the output directory name. Then, to run the analysis script: ``` python analyze.py greedy greedy_sorted ``` For more details on the experiment code, refer to: https://www.internalfb.com/diff/D28946756 Notes: There exists an optimal solution to this partitioning problem. An algorithm that finds such a solution is the [complete greedy algorithm (CGA)](https://en.wikipedia.org/wiki/Greedy_number_partitioning#An_exact_algorithm), which reduces to the brute-force combinatorial search in the worst case. There exist heuristics to improve the `k = 2` case (i.e. when there are two processes); however, given that `n` in typical use cases is very large, any algorithm that is quadratic or slower is unrealistic. Other exact algorithms are similarly exponential in the worst case, rendering them intractable. Given this, I do not currently see a need for future proofing the partitioning algorithm against the introduction of algorithms beyond the naive greedy and the sorted greedy algorithms. --- In the current ZeRO implementation, the core `partition_parameters()` computation happens twice upon initialization (i.e. call to `__init__()`): first from a call to `_param_to_rank()` (i.e. an access to `_param_to_rank`) and then from a call to `_update_trainable()`. `_update_trainable()` sees that no optimizer has been constructed yet, so it clears the cache, eliminating the first `partition_parameters()` computation and performing a redundant re-computation. Here is a typical trace: - [The ZeRO optimizer object is initialized, calling `__init__()`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L142)`) - [In `__init__()`, `self._device` is set, so it accesses `self._per_device_params`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L182)`) - [`self._per_device_params` is not cached, so it accesses `self._param_to_rank`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L340)`) - [`self._param_to_rank` is not cached, so it calls `partition_parameters()`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L353)`) (first call to `partition_parameters()`) - [`__init__()` later calls `_update_trainable()`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L185)`) - [In `_update_trainable()`, `self` does not have `attr` `"optim"`, so it clears the cached objects (notably, `self._partition_parameters_cache`).](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L591)`) - [`_update_trainable()` calls `self.partition_parameters()`.](`d125694d0b/torch/distributed/optim/zero_redundancy_optimizer.py (L593)`) (second call to `partition_parameters()`) Based on the discussion [here](https://github.com/pytorch/pytorch/pull/59410), this recomputation is unintentional and should be addressed in a future diff. Test Plan: I verified that the total number of parameters across the processes was consistent after the partitioning algorithm change. Otherwise, no additional modifications were made to existing tests. Reviewed By: mrshenli Differential Revision: D28946755 fbshipit-source-id: 7ad66a21a963555b3b2e693ba8069d2dddc94c60	2021-06-08 09:47:35 -07:00
Lily Johnson	935057fc74	[package] turn MockZipReader into DirectoryReader and add test coverage (#59107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59107 Adding documentation, test coverage, and a missing method to the `DirectoryReader` class. `DirectoryReader` was previously named `MockZipReader`, and is used for operating on opened package archives via a `PackageImporter`. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28760410 Pulled By: Lilyjjo fbshipit-source-id: aa9d0a68e19738a6d5555bb04ce33af6a53f1268	2021-06-08 08:02:34 -07:00
雾雨魔理沙	693b2696f8	add dispatch for bitwise_and (#59388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59388 Reviewed By: agolynski Differential Revision: D28891985 Pulled By: ezyang fbshipit-source-id: 4f8b301ba615f1e21a920f02166d64c978204adb	2021-06-08 07:51:47 -07:00
Jeffrey Wan	4920d5a05a	Temporarily add skip to fix slow gradcheck failure on master (#59585 ) Summary: Related https://github.com/pytorch/pytorch/issues/59584 Failure https://app.circleci.com/pipelines/github/pytorch/pytorch/331771/workflows/fed7923c-3490-490f-8769-81a71beae558/jobs/13940286 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59585 Reviewed By: albanD Differential Revision: D28945267 Pulled By: soulitzer fbshipit-source-id: 72ae4b6c9a04fe9fdfb89888e12bae25c78be23c	2021-06-08 07:21:30 -07:00
Erjia Guan	5c7e14d2bc	[DataLoader] Switch NotImplementedError to TypeError for len (#59464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59464 Fixes #59378 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28944447 Pulled By: ejguan fbshipit-source-id: 8b3d53a1863b41e578d56f219e452d18d7eae0d8	2021-06-08 07:16:18 -07:00
Erjia Guan	1b578c4bf5	[DataLoader] Close byte stream explicitly (#58938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58938 When run `test_datapipe.py`, python `gc` would report lots of `ResourceWarning`s due to unclosed stream. It's not only annoying, there are two potential problems: - Performance regression because `gc` requires additional memory and computation to track reference - Python `gc` runs periodically so we many encountered an error of too many open files due to OS limitation To reduce the warning: - Explicitly close byte stream - Modify `test_datapipe.py` to use context manager Small fix: - Reorder import in `test_datapipe.py` Further investigation: Can we directly use context manager in `LoadFileFromDisk` and `ReadFileFromTar` to eliminate this Error? - Probably no. It's feasible only if the pipeline is synchronized and without prefetching. When we enable these two features, the scope guard of the context manager doesn't work. - We may need to implement some reference counter attached to these file byte stream to close by itself. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28689862 Pulled By: ejguan fbshipit-source-id: bb2a85defb8a4ab5384db902ef6ad062185c2653	2021-06-08 07:15:08 -07:00
Chen Lai	90c5b74e47	Back out "[PyTorch Edge] bytecode version bump to v5 and enable share constant table" (#59432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59432 Original commit changeset: 6f5cf4296eaa ghstack-source-id: 130805860 Test Plan: CI Reviewed By: raziel, iseeyuan Differential Revision: D28892955 fbshipit-source-id: ce414a4c7a18001bdd27333cea03c6403b39d146	2021-06-08 07:11:26 -07:00
Alban Desmaison	5d6a10a765	Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py Test Plan: revert-hammer Differential Revision: D28913223 (`24432eaa29`) Original commit changeset: 0d1f99109734 fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937	2021-06-08 02:18:16 -07:00
Akshit Khurana	010bcb4c2d	Fix xnnpack hardswish memory issue (#59577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59577 Collapse all dimensions of tensor into batch and use channels as 1. Fixes the 1D over calculation case Test Plan: buck test fbandroid/mode/server fbandroid/mode/asan_ubsan fbsource//xplat/caffe2:pt_xnnpack_test buck test fbsource//xplat/caffe2:pt_xnnpack_test Reviewed By: kimishpatel Differential Revision: D28942141 fbshipit-source-id: b36f820a900b6a2ed649d6b9bac79d3392d3537c	2021-06-07 21:56:05 -07:00
Jacob Szwejbka	1faba1e4cc	[Pytorch Edge] Make RegisterBackendSelect Selective (#59096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59096 RegisterBackendSelect is bringing in ~100 extra ops to the runtime. This messes with the compatibility api, and also adds a nontrivial amount of size. Test Plan: Model Unittests/CI Reviewed By: iseeyuan Differential Revision: D28588100 fbshipit-source-id: ffd0b5b9cbe20f27dbf3be418a6c1f80c7396fdb	2021-06-07 19:48:46 -07:00
Jiakai Liu	501320ed81	[pytorch] deprecate default_op_deps.yaml (#59573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59573 To do mobile selective build, we have several options: 1. static dispatch; 2. dynamic dispatch + static analysis (to create the dependency graph); 3. dynamic dispatch + tracing; We are developing 3. For open source, we used to only support 1, and currently we support both 1 and 2. This file is only used for 2. It was introduced when we deprecated the static dispatch (1). The motivation was to make sure we have a low-friction selective build workflow for dynamic dispatch (2). As the name indicates, it is the default dependency graph that users can try if they don't bother to run the static analyzer themselves. We have a CI to run the full workflow of 2 on every PR, which creates the dependency graph on-the-fly instead of using the committed file. Since the workflow to automatically update the file has been broken for a while, it started to confuse other pytorch developers as people are already manually editing it, and it might be broken for some models already. We reintroduced the static dispatch recently, so we decide to deprecate this file now and automatically turn on static dispatch if users run selective build without providing the static analysis graph. The tracing-based selective build will be the ultimate solution we'd like to provide for OSS, but it will take some more effort to polish and release. Differential Revision: D28941020 D28941020 Test Plan: Imported from OSS Reviewed By: dhruvbird Pulled By: ljk53 fbshipit-source-id: 9977ab8568e2cc1bdcdecd3d22e29547ef63889e	2021-06-07 19:37:37 -07:00
Jongsoo Park	c436426be8	[fbgemm] fix gconv + acc16 (#59541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59541 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/621 Fixing 2 issues. These are actually 2 independent issues one in Caffe2 and another in FBGEMM, so no need to wait until FBGEMM is synchronized with PyTorch 1) conv 16-bit accumulation doesn't support fast gconv path, so TakeGConvFastPath_ should honor it 2) packed_index_ generates indices up to (G/GTogether_) F R S OC_per_G GTogether_ paddedICPerG which can exceed G kernel_prod OC_per_G paddedICPerG allocated in PackWeightMatrixForGConv (kernel_prod = F R S): e.g., when G=3, GTogether_=2, we allocate 3 F R S OC_per_G paddedICPerG but we access up to 2 F R S OC_per_G 2 paddedICPerG BTW, not sure how we haven't known about this issue for so long. Any idea will be really appreciated. Test Plan: In a BDW machine, buck test //caffe2/caffe2/quantization/server:conv_groupwise_dnnlowp_acc16_op_test -- --run-disabled Reviewed By: dskhudia Differential Revision: D28927214 fbshipit-source-id: 3ec98ea2fc177545392a0148daca592d80f40ad3	2021-06-07 19:20:59 -07:00
Rong Rong (AI Infra)	57d8bccd00	only reorder tests based on git diff if IN_CI (#59565 ) Summary: Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565 Reviewed By: ejguan Differential Revision: D28943906 Pulled By: walterddr fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537	2021-06-07 17:54:19 -07:00
Vasiliy Kuznetsov	dafa4b3517	quantization: improve documentation on natively supported backends (#58925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58925 Cleans up documentation on natively supported backends. In particular: * adds a section title * deduplicates information about fbgemm/qnnpack * clarifies what `torch.backends.quantized.engine` does * adds code samples with default settings for `fbgemm` and `qnnpack` Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28681840 Pulled By: vkuzo fbshipit-source-id: 51a6ab66934f657553351f6c84a638fd5f7b4e12	2021-06-07 17:29:03 -07:00
Yi Wang	6575975da9	[Reland2][DDP] Merge work and future_work in reducer (#59574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59574 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. 1) Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. 2) Compared with the reverted https://github.com/pytorch/pytorch/pull/59520, disabled `test_DistributedDataParallel_non_default_stream` on AMD, because now applying division first hurts the gradient averaging accuracy on AMD. See [07:48:26]: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/1129/console #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130752393 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_non_default_stream Reviewed By: rohan-varma Differential Revision: D28940800 fbshipit-source-id: 1ba727ac951ebc1e7875dc1a1be8108a2c8d9462	2021-06-07 16:52:20 -07:00
Richard Barnes	fbe65b16ae	Use irange in torch/csrc/jit (#55716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55716 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27690245 fbshipit-source-id: 6052b0acd792a9527d131822453a17cdb7ae3ba5	2021-06-07 16:48:08 -07:00
Rong Rong	ff553e5b09	enable upload test stats on PR (#59567 ) Summary: Enable test stats upload on PR. Uses PR number as part of the key so that it can be properly indexed and later parsed if PR has been merged/closed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59567 Reviewed By: ejguan Differential Revision: D28943654 Pulled By: walterddr fbshipit-source-id: f3a7a25ae14c6877067e1b347e3a8658d80d1544	2021-06-07 16:45:10 -07:00
Jane Xu	24432eaa29	Adding run-specified-test-cases option in run_test.py (#59487 ) Summary: The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name. This PR also adds .json to some files we use for better clarity. Usage: `python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like: ``` test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750 test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729 test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825 test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825 test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805 test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487 Test Plan: Without specifying the option, everything should be as they were before. Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv Loading specified test cases to run from windows_smoke_tests.csv. Processed 28 test cases. Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781] s ---------------------------------------------------------------------- Ran 1 test in 0.000s OK (skipped=1) ... ``` With pytest, an example executable would be: `Running test_dataloader ... [2021-06-04 17:37:57.643039] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]` Reviewed By: samestep Differential Revision: D28913223 Pulled By: janeyx99 fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127	2021-06-07 16:27:43 -07:00
Jane Xu	caf76c2445	Move sharding to after all tests have been excluded (#59583 ) Summary: It would be most accurate if sharding occurred after all other changes to selected_tests were complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583 Reviewed By: ejguan Differential Revision: D28944737 Pulled By: janeyx99 fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7	2021-06-07 15:04:36 -07:00
Richard Barnes	93140a31e2	Use irange in a few places (#55325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55325 Test Plan: Sandcastle Reviewed By: SciPioneer Differential Revision: D27573006 fbshipit-source-id: 647b5da3901e92c23e95b2fe5e833e9081d72837	2021-06-07 14:53:41 -07:00
Sam Estep	737d920b21	Strictly type everything in .github and tools (#59117 ) Summary: This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117 Test Plan: ``` flake8 mypy --config mypy-strict.ini ``` Reviewed By: malfet Differential Revision: D28765386 Pulled By: samestep fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e	2021-06-07 14:49:36 -07:00
Thomas J. Fan	6ff001c125	DOC Improve documentation for LayerNorm (#59178 ) Summary: Closes https://github.com/pytorch/pytorch/issues/51455 I think the current implementation is aggregating over the correct dimensions. The shape of `normalized_shape` is only used to determine the dimensions to aggregate over. The actual values of `normalized_shape` are used when `elementwise_affine=True` to initialize the weights and biases. This PR updates the docstring to clarify how `normalized_shape` is used. Here is a short script comparing the implementations for tensorflow and pytorch: ```python import torch import torch.nn as nn import tensorflow as tf from tensorflow.keras.layers import LayerNormalization rng = np.random.RandomState() x = rng.randn(10, 20, 64, 64).astype(np.float32) # slightly non-trival x[:, :10, ...] = x[:, :10, ...] * 10 + 20 x[:, 10:, ...] = x[:, 10:, ...] * 30 - 100 # Tensorflow Layer norm x_tf = tf.convert_to_tensor(x) layer_norm_tf = LayerNormalization(axis=[-3, -2, -1], epsilon=1e-5) output_tf = layer_norm_tf(x_tf) output_tf_np = output_tf.numpy() # PyTorch Layer norm x_torch = torch.as_tensor(x) layer_norm_torch = nn.LayerNorm([20, 64, 64], elementwise_affine=False) output_torch = layer_norm_torch(x_torch) output_torch_np = output_torch.detach().numpy() # check tensorflow and pytorch torch.testing.assert_allclose(output_tf_np, output_torch_np) # manual comutation manual_output = ((x_torch - x_torch.mean(dim=(-3, -2, -1), keepdims=True)) / (x_torch.var(dim=(-3, -2, -1), keepdims=True, unbiased=False) + 1e-5).sqrt()) torch.testing.assert_allclose(output_torch, manual_output) ``` To get to the layer normalization as shown here: <img width="157" alt="Screen Shot 2021-05-29 at 2 13 52 PM" src="https://user-images.githubusercontent.com/5402633/120080691-1e37f100-c088-11eb-9060-4f263e4cd093.png"> One needs to pass in `normalized_shape` with shape `x.dim() - 1` with the size of the channels and all spatial dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59178 Reviewed By: ejguan Differential Revision: D28931877 Pulled By: jbschlosser fbshipit-source-id: 193e05205b9085bb190c221428c96d2ca29f2a70	2021-06-07 14:34:10 -07:00
Nikita Vedeneev	a30b359590	fix double backward for `binary_cross_entropy` loss function when `reduction=sum`. (#59479 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59477. ```python In [1]: import torch In [2]: x = torch.rand(3, 3, dtype=torch.double, requires_grad=True) In [3]: y = torch.rand(3, 3, dtype=torch.double) In [4]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y]) Out[4]: True In [5]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='mean'), [x, y]) Out[5]: True In [6]: torch.autograd.gradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y]) Out[6]: True ``` More comprehensive testing could be added in https://github.com/pytorch/pytorch/pull/59447 where explicit `gradcheck` and `gradgradcheck` tests are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59479 Reviewed By: ejguan Differential Revision: D28934354 Pulled By: albanD fbshipit-source-id: 12ce68e3c5c499b2531f7cdba3c22548d67e07e9	2021-06-07 14:14:08 -07:00
Takeshi Watanabe	77dde35f1a	Fix error message formatting in _make_grads (#59532 ) Summary: - TORCH_CHECK doesn't handle printf style format and it will output like: `got %ld tensors and %ld gradients21` - `got 2 tensors and 1 gradients` should be the expected message for this Pull Request resolved: https://github.com/pytorch/pytorch/pull/59532 Reviewed By: ejguan Differential Revision: D28934680 Pulled By: albanD fbshipit-source-id: 2d27a754ae81310b9571ae2a2ea09d0f8d8a3d81	2021-06-07 14:05:24 -07:00
Jeff Daily	24e27af683	[ROCm] enable kernel asserts (#49624 ) Summary: Addresses missing ROCm feature indicated in https://github.com/pytorch/pytorch/issues/38943. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49624 Reviewed By: agolynski Differential Revision: D28902459 Pulled By: malfet fbshipit-source-id: 29c9b552770241a0ec52cd057ea45efc4389d838	2021-06-07 13:43:07 -07:00
Saketh Are	05b571ee8e	fix name of 'dims' kwarg in torch.tile docs (#59471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59471 Fixes #59150 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28908569 Pulled By: saketh-are fbshipit-source-id: 57d0e75d899a1d9979e8bdb20dfd2b136dd63d1b	2021-06-07 13:18:19 -07:00
Jane Xu	b0ac9bfb2b	Add warning about should_drop for JIT coverage plug-in (#57961 ) Summary: This adds a comment above `should_drop` to prevent someone from inadvertently breaking JIT coverage by renaming the function without updating the correct references. The current JIT plug-in uses `should_drop` to figure out which code is going to be JIT'd. If the function is named differently, the plug-in would also need to be updated. Question: I understand this may not be the cleanest solution. Would a cleaner solution be to create a dummy function that would simply exist for the JIT plug-in? I did not immediately do that as that may be adding unnecessary code complexity in torch.jit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57961 Reviewed By: samestep Differential Revision: D28933587 Pulled By: janeyx99 fbshipit-source-id: 260aaf7b11f07de84a81d6c3554c4a5ce479d623	2021-06-07 12:48:01 -07:00
Thomas J. Fan	8693e288d7	DOC Small rewrite of interpolate recompute_scale_factor docstring (#58989 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55909 This PR looks to improve the documentation to describe the following behavior: `8130f2f67a/torch/nn/functional.py (L3673-L3685)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58989 Reviewed By: ejguan Differential Revision: D28931879 Pulled By: jbschlosser fbshipit-source-id: d1140ebe1631c5ec75f135c2907daea19499f21a	2021-06-07 12:40:05 -07:00
Scott Wolchok	1798ff02e4	[PyTorch] Optimize c10::optional<ArrayRef<T>> for size (#59333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59333 Code comment should explain this in sufficient detail. In brief, making it 16 bytes should get it to be passed in registers. ghstack-source-id: 130631329 Test Plan: Updated optional_test and added static_assert in Optional.cpp. Reviewed By: ezyang Differential Revision: D28843027 fbshipit-source-id: 3029f05e03a9f04ca7337962e7770cdeb9a608d9	2021-06-07 11:35:17 -07:00
Angela Yi	cc03ea2c47	[quant] Implemented InputWeightObserver for Linear inputs Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx(). Test Plan: python test/test_quantization.py TestEqualizeFx Reviewed By: supriyar Differential Revision: D28836954 fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104	2021-06-07 11:19:43 -07:00
Nikita Vedeneev	c51abf8fca	Make `binary_cross_entropy` differentiable wrt `target` (#59447 ) Summary: As per title. Resolves https://github.com/pytorch/pytorch/issues/56683. `gradgradcheck` will fail once `target.requires_grad() == True` because of the limitations of the current double backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59447 Reviewed By: agolynski Differential Revision: D28910140 Pulled By: albanD fbshipit-source-id: 20934880eb4d22bec34446a6d1be0a38ef95edc7	2021-06-07 09:20:17 -07:00
Mike Ruberry	94cc681fc2	Revert D28922305: [Reland][DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28922305 (`3137bbeb1a`) Original commit changeset: 6388a96eda7a fbshipit-source-id: bc150672e857286eeb129ea683b1cfd2034f0564	2021-06-07 03:58:20 -07:00
Mike Ruberry	f998e63dca	Revert D28922548: [Gradient Compression] Apply division first to avoid overflow Test Plan: revert-hammer Differential Revision: D28922548 (`459270ac01`) Original commit changeset: 442bd3cc7a35 fbshipit-source-id: 7e4361b4eb283cdb21f15a36d6eebf558dd7386f	2021-06-07 03:57:10 -07:00
Yi Wang	459270ac01	[Gradient Compression] Apply division first to avoid overflow (#59522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59522 If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce. This fix is applied to both C++ and Python comm hooks. ghstack-source-id: 130686229 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D28922548 fbshipit-source-id: 442bd3cc7a35a8b948f626062fa7ad2e3704c5be	2021-06-07 01:43:10 -07:00
Protonu Basu	a2e56fa0dc	Adding users of a node to the serialized JSON. (#59357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59357 Adding users of a node to the serialized JSON. Illustrated in the example: JSON: P419734894 Examples: { "shape": "[7]", "dtype": "torch.float16", "stride": "[1]", "is_quantized": false, "target": "conv.bias", "op_code": "get_attr", "name": "conv_bias", "args": [], "kwargs": {}, "users": [ { "is_node": true, "name": "to_dtype" } ] } { "target": "output", "op_code": "output", "name": "output", "args": [ { "is_node": true, "name": "fba_layout_transform_1", "shape": "[3, 7, 12, 12]", "dtype": "torch.float16", "stride": "[1008, 144, 12, 1]", "is_quantized": false } ], "kwargs": {}, "users": [] } Test Plan: buck test //caffe2/test:test_fx_experimental Reviewed By: gcatron, jfix71 Differential Revision: D28857487 fbshipit-source-id: a3bac6bdb21ce10ba4a0d170c809aef13e6174a6	2021-06-06 23:15:32 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
Dhruv Matani	8c852de54d	[PyTorch Edge] Remove legacy and kineto profilers from mobile build (#58730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58730 The sources for the profilers are not needed in the mobile build, and unnecessarily add weight to the build. Remove them from the lite-interpreter build. ghstack-source-id: 130684568 Test Plan: Build + BSB Reviewed By: kimishpatel, raziel Differential Revision: D28563725 fbshipit-source-id: 9d6f76176c2d2bbc25703281af1a076b1f2b4f19	2021-06-06 13:16:07 -07:00
Yi Wang	3137bbeb1a	[Reland][DDP] Merge work and future_work in reducer (#59520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59520 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130685351 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view Reviewed By: walterddr Differential Revision: D28922305 fbshipit-source-id: 6388a96eda7a06f292873afed6d1362096c13e1c	2021-06-06 09:49:08 -07:00
Peter Bell	390fe74944	Migrate `torch.lstsq` to ATen (#59400 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24726, closes https://github.com/pytorch/pytorch/issues/44011 This builds on the port from https://github.com/pytorch/pytorch/issues/44011. I've rebased on master and addressed mruberry's comments. There were also some unnecessary copies of `B` taking place that I've cleaned up. This function is already deprecated, but since it's the last lapack routine in TH, it's still worth porting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59400 Reviewed By: mruberry Differential Revision: D28922060 Pulled By: ngimel fbshipit-source-id: cfd7ec8b50d2ab886f0e04a2a557e4e410ee8184	2021-06-06 02:18:17 -07:00
kshitij12345	da972afdcd	OpInfo: to_sparse (#59445 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59445 Reviewed By: ngimel Differential Revision: D28920866 Pulled By: mruberry fbshipit-source-id: ba8d3071d9937096288b69511000eeb007f53434	2021-06-05 19:13:58 -07:00
kshitij12345	96ac0e0340	OpInfo: t (#59442 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59442 Reviewed By: agolynski Differential Revision: D28898946 Pulled By: mruberry fbshipit-source-id: be32429fa7306554e4912fdcc382593d00c9f4ad	2021-06-05 18:59:38 -07:00
Akifumi Imanishi	0a5bfa9919	Support `__rmod__` (#58476 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58035. This PR implements `torch.Tensor.__rmod__` and `torch.remainder(scalar, tensor)` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo, kmaehashi) TODO: - [x] Update `tensor_binary_op` in test/test_binary_ufuncs.py after https://github.com/pytorch/pytorch/issues/58216 is merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58476 Reviewed By: ngimel Differential Revision: D28776810 Pulled By: mruberry fbshipit-source-id: 74f8aea80f439ef2cc370333524e39971eeb7bf4	2021-06-05 16:19:24 -07:00
Natalia Gimelshein	344ecb2e71	flip via TI (#59509 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59509 Reviewed By: mruberry Differential Revision: D28918665 Pulled By: ngimel fbshipit-source-id: b045c7b35eaf22e53b1bc359ffbe5a4fda05dcda	2021-06-05 15:43:29 -07:00
Kshiteej K	1be7ca71ee	OpInfo: log_softmax (#59336 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59336 Reviewed By: agolynski Differential Revision: D28899052 Pulled By: mruberry fbshipit-source-id: 60a9a4ffbca5a0f2c899d4d83500dcab4555ffb0	2021-06-05 13:51:50 -07:00
James Donald	1dcc034fba	[caffe2] Avoid attempt to use undefined preprocessor directive Summary: This is somewhat more verbose, but it's more correct and addresses this warning on Visual Studio 2017: ``` xplat\caffe2\caffe2\core\common.h(76): warning C4067: unexpected tokens following preprocessor directive - expected a newline ``` Test Plan: Built locally with fix Reviewed By: simpkins Differential Revision: D28868632 fbshipit-source-id: f6a583e8275162adedb2a4bc5ed0f64847020871	2021-06-05 09:22:52 -07:00
Can Balioglu	1d9c1cc00a	[4/n] [c10d] Introduce the multi-tenancy feature in TCPStore (#58331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58331 This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair. ghstack-source-id: 130676394 Test Plan: - Run the existing and newly-introduced tests. - Run several smoke tests including the short code snippet referred in GitHub issue #41614. Reviewed By: H-Huang Differential Revision: D28453850 fbshipit-source-id: f9066b164305de0f8c257e9d5736e93fd7e21ec6	2021-06-05 07:50:07 -07:00
Can Balioglu	844a98758a	[3/n] [c10d] Revise the implementation of TCPStore (#58330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58330 This PR is part of a stack that addresses the GitHub issue #41614; it introduces a major refactoring of the `TCPStore` class in preparation of the multi-tenancy feature. - All TCP sockets are wrapped with a new `TCPSocket` RAII type. - `BackgroundThread` and daemon types are moved from header to cpp file. - Server, client, and callback sockets are refactored into their own internal types `TCPServer`, `TCPClient` and `TCPCallbackClient`. - Calls to `tcputil::send` and `tcputil::recv` are wrapped in `TCPClient` for easier readability and maintenance purposes. - Two `TODO` statements are put to reference future improvements. Based on feedback, I will either create separate GitHub issues for them or address them as part of this stack. ghstack-source-id: 130676392 Test Plan: Run the existing tests since there are no user-facing behavioral changes. Reviewed By: H-Huang Differential Revision: D28448981 fbshipit-source-id: 415b21e74b3cd51d673c1d5c349c6a2cb21dd667	2021-06-05 07:50:06 -07:00
Can Balioglu	4ee761c2c5	[2/n] [c10d] Introduce the 'multiTenant' constructor parameter in TCPStore (#58329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58329 This PR is part of a stack that addresses the GitHub issue #41614; it introduces: - A new `multiTenant` constructor option for the `TCPStore` class indicating whether multiple store instances can be initialized with the same host:port pair. - Updates to the C10d distributed (elastic) rendezvous and the `init_process_group` method to leverage the new `multiTenant` feature. Note that the multi-tenancy feature itself is implemented in the fourth PR of this stack. In this PR passing `true` to `multiTenant` results only with a warning output. ghstack-source-id: 130676389 Test Plan: Run the existing tests since there are no behavioral changes. Reviewed By: rohan-varma Differential Revision: D28424978 fbshipit-source-id: fb1d1d81b8b5884cc5b54486700a8182a69c1f29	2021-06-05 07:50:04 -07:00
Can Balioglu	cf408c3743	[1/n] [c10d] Introduce a new TCPStore constructor (#58328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328 This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest. The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack. ghstack-source-id: 130676384 Test Plan: Run the existing tests since there are no behavioral changes. Reviewed By: H-Huang Differential Revision: D28417742 fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9	2021-06-05 07:50:02 -07:00
Rong Rong (AI Infra)	91eb831422	Revert D28698997: [Static Runtime] Add schema check to aten ops Test Plan: revert-hammer Differential Revision: D28698997 (`10345010f7`) Original commit changeset: 232fc60c0321 fbshipit-source-id: e351df62779fea85b7afe5160d3c40c4e7cee4ed	2021-06-05 07:48:49 -07:00
Rong Rong (AI Infra)	c88a0b55b3	Revert D28677383: [DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28677383 (`f8bebade47`) Original commit changeset: 85e0620378b7 fbshipit-source-id: ef3c65b88c375aa9a6befe2ab004ec37ae7eb587	2021-06-05 07:25:44 -07:00
Yi Wang	f8bebade47	[DDP] Merge work and future_work in reducer (#58937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130673249 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: agolynski Differential Revision: D28677383 fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752	2021-06-05 01:18:30 -07:00
Natalia Gimelshein	5117ac3bb4	Revert D28877076: [pytorch][PR] torch.flip via TI Test Plan: revert-hammer Differential Revision: D28877076 (`d82bc3feb8`) Original commit changeset: 4fa6eb519085 fbshipit-source-id: c81e7d3283ff6822db913bf9f49a1533268755d0	2021-06-04 23:03:53 -07:00
Hao Lu	10345010f7	[Static Runtime] Add schema check to aten ops (#59426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59426 Reviewed By: ajyu Differential Revision: D28698997 fbshipit-source-id: 232fc60c0321b8e68e4f1b6705233485260c281d	2021-06-04 21:38:45 -07:00
lezcano	d82bc3feb8	torch.flip via TI (#58747 ) Summary: Implements an idea by ngimel to improve the performance of `torch.flip` via a clever hack into TI to bypass the fact that TI is not designed to work with negative indices. Something that might be added is vectorisation support on CPU, given how simple the implementation is now. Some low-hanging fruits that I did not implement: - Write it as a structured kernel - Migrate the tests to opinfos - Have a look at `cumsum_backward` and `cumprod_backward`, as I think that they could be implemented faster with `flip`, now that `flip` is fast. Edit This operation already has OpInfos and it cannot be migrated to a structured kernel because it implements quantisation Summary of the PR: - x1.5-3 performance boost on CPU - x1.5-2 performance boost on CUDA - Comparable performance across dimensions, regardless of the strides (thanks TI) - Simpler code <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(size, dims, num_threads, device): x = torch.rand(size, device=device) timer = Timer( "torch.flip(x, dims=dims)", globals={"x": x, "dims": dims}, label=f"Flip {device}", description=f"dims: {dims}", sub_label=f"size: {size}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): sizes = ((1000,)2, (1000,)3, (10000,)2) for size, device in product(sizes, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) list_dims = [(0,), (1,), (0, 1)] if len(size) == 3: list_dims.append((0, 2)) for num_threads, dims in product(threads, list_dims): yield size, dims, num_threads, device def compare(): compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.colorize() compare.print() compare() ``` </details> <details> <summary> Benchmark PR </summary> ![image](https://user-images.githubusercontent.com/3291265/119139954-81e46d80-ba3b-11eb-9aad-e825e515d41b.png) </details> <details> <summary> Benchmark master </summary> ![image](https://user-images.githubusercontent.com/3291265/119139915-76914200-ba3b-11eb-9aa8-84b3ca220c93.png) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58747 Reviewed By: agolynski Differential Revision: D28877076 Pulled By: ngimel fbshipit-source-id: 4fa6eb519085950176cb3a9161eeb3b6289ec575	2021-06-04 20:13:38 -07:00
Jeongmin Lee	bca25d97ad	[itemwise-dropout][1/x][low-level module] Implement Itemwise Sparse Feature Dropout in Dper3 (#59322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59322 Implement sparse feature dropout (with replacement) that can drop out individual items in each sparse feature. For example, the existing sparse feature dropout with replacement drops out whole feature (e.g., a list of page ids) when the feature is selected for drop out. This itemwise dropout assigns probability and drops out to individual items in sparse features. Test Plan: ``` buck test mode/dev caffe2/torch/fb/sparsenn:test ``` https://www.internalfb.com/intern/testinfra/testrun/281475166777899/ ``` buck test mode/dev //dper3/dper3/modules/tests:sparse_itemwise_dropout_with_replacement_test ``` https://www.internalfb.com/intern/testinfra/testrun/6473924504443423 ``` buck test mode/opt caffe2/caffe2/python:layers_test ``` https://www.internalfb.com/intern/testinfra/testrun/2533274848456607 ``` buck test mode/opt caffe2/caffe2/python/operator_test:sparse_itemwise_dropout_with_replacement_op_test ``` https://www.internalfb.com/intern/testinfra/testrun/8725724318782701 Reviewed By: Wakeupbuddy Differential Revision: D27867213 fbshipit-source-id: 8e173c7b3294abbc8bf8a3b04f723cb170446b96	2021-06-04 19:59:17 -07:00
David Reiss	68df4d40d2	show_pickle/model_dump: Handle invalid UTF-8 in pickles (#57661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57661 Thie Pickle "specification" (pickletools.py) states that the argument to a BINUNICODE opcode must be UTF-8 encoded. However, if a PyTorch custom class returns a non-UTF-8 std::string from its pickle method the libtorch Pickler will write it to the output pickle without complaining. Python's _Unpickler (the Python implementation of Unpickler) always throws an exception when trying to deserialize these invalid pickles. We still want to be able to dump these pickle files. Update DumpUnpickler to create its own opcode dispatch table (initialized as a clone of the _Unpickler dispatch table) and patch in a custom function for the BINUNICODE op. We try to emulate the default behavior, but any UnicodeDecodeError is caught and replaced with a dummy object. This could violate the assumptions of a user that expects a str in that position, so we disable this behavior by default. Update model_dump to recognize this special object and allow it to be rendered. Test Plan: Dumped and viewed a model with an invalid string in an object state. Reviewed By: malfet Differential Revision: D28531392 Pulled By: dreiss fbshipit-source-id: ab5aea20975a0ef53ef52a880deaa2c5a626e4a2	2021-06-04 19:42:25 -07:00
Nikita Shulga	ba3a90b55e	Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors. Test Plan: revert-hammer Differential Revision: D28819780 Original commit changeset: f3feff35a1ce fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a	2021-06-04 19:25:30 -07:00
Nikita Shulga	88fb5ee84c	Revert D28819779: [TensorExpr] Improve debug messages. Test Plan: revert-hammer Differential Revision: D28819779 Original commit changeset: 2eaa0b78fb30 fbshipit-source-id: babc22f75d87b1ba25f78ffe59266560413778ce	2021-06-04 19:20:31 -07:00
Facebook Community Bot	aa66990ef1	Automated submodule update: kineto (#54604 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `88e3332ab9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54604 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: malfet Differential Revision: D27297755 fbshipit-source-id: 5f5dd2429fb561530e6a59285c6ae708e5818ce9	2021-06-04 18:54:32 -07:00
Nikita Shulga	18848d55b7	Do not use gold linker for CUDA builds (#59490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59490 Reviewed By: agolynski, seemethere Differential Revision: D28913160 Pulled By: malfet fbshipit-source-id: d27092c252fc86424028abe146cf5f33a2f74544	2021-06-04 18:12:26 -07:00
David Reiss	a682ff7ef1	Add kMaxSupportedBytecodeVersion for Lite Interpreter (#59472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59472 Previously, the lite interpreter would refuse to load any model with a version greater than kProducedBytecodeVersion. Now, we're able to independently advance the loading and saving code, so we can roll out changes without breaking forward compatibility. Test Plan: CI. Loaded a bytecode v5 model even with setting kProducedBytecodeVersion to v4. Reviewed By: raziel Differential Revision: D28904350 fbshipit-source-id: 598c22f0adf47d4ed3e976bcbebdf3959dacb1df	2021-06-04 17:55:02 -07:00
Nikita Shulga	d125694d0b	Move CUDA async warning to suffix (#59467 ) Summary: After the change async error warnings look as follows: ``` $ python -c "import torch;torch.eye(3,3,device='cuda:777')" Traceback (most recent call last): File "<string>", line 1, in <module> RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59467 Reviewed By: ngimel Differential Revision: D28904360 Pulled By: malfet fbshipit-source-id: 2a8fa5affed5b4ffcaa602c8ab2669061cde7db0	2021-06-04 17:26:28 -07:00
Nikita Shulga	f23c45bd04	Revert D28841011: [TensorExpr] Fix printing of Bool dtype. Test Plan: revert-hammer Differential Revision: D28841011 (`19985d6f84`) Original commit changeset: 9f68dd47e14a fbshipit-source-id: ff517cfff49e46ed513e79eabbe9e9fd246ccce8	2021-06-04 16:27:14 -07:00
Bert Maher	6309b342c3	[nnc] Enable CPU fuser inside FB, take 5 (#59461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59461 long tail test failues ghstack-source-id: 130607578 Test Plan: fixed T92123560 Reviewed By: navahgar Differential Revision: D28892885 fbshipit-source-id: 762a275b5aa14af0847e46cbf4036d3342b82189	2021-06-04 16:26:46 -07:00
Bert Maher	f5e3eae82a	[nnc] Infer device type from nodes if inputs are all scalars (#59430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59430 With constant support added, we can now have fusion groups with only scalar inputs. So, we need to get the device type from the nodes in the graph rather than just the inputs. ghstack-source-id: 130613871 Test Plan: new unit test; also see test_tracer test_trace_of_script Reviewed By: navahgar Differential Revision: D28891989 fbshipit-source-id: f9e824acbd4856216b85a135c8cb60a2eac3c628	2021-06-04 16:25:33 -07:00
Eli Uriegas	a776072de6	.github: Switch windows instance types (#59473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59473 Switches windows instance types to prep for usage of the AWS built Windows AMI with pre-installed Tesla Driver. Unforutnately neither c5.4xlarge nor g3.8xlarge is not supported by this AMI but luckily we can swap those out with pretty comparable alternatives like c5d.4xlarge and p3.2xlarge. For CPU workflows this shouldn't provide any real difference since the CPU / Memory is the same with c5d.4xlarge. For GPU workflows the GPU with the p3.2xlarge is a Tesla V100 which should suit our needs. <details> <summary> nvidia-smi.exe (p3.2xlarge) </summary> ``` PS C:\Users\Administrator> & 'C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe' Fri Jun 4 18:53:10 2021 +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 462.31 Driver Version: 462.31 CUDA Version: 11.2 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name TCC/WDDM \| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \| \| \| MIG M. \| \|===============================+======================+======================\| \| 0 Tesla V100-SXM2... TCC \| 00000000:00:1E.0 Off \| 0 \| \| N/A 42C P0 23W / 300W \| 0MiB / 16258MiB \| 0% Default \| \| \| \| N/A \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: \| \| GPU GI CI PID Type Process name GPU Memory \| \| ID ID Usage \| \|=============================================================================\| \| No running processes found \| +-----------------------------------------------------------------------------+ ``` </details> It might eventually make sense to also switch linux to these instance types but do bear in mind that p3.2xlarge for linux is ~$0.75 more expensive than g3.8xlarge * [Price comparison for g3.8xlarge vs. p3.2xlarge](https://instances.vantage.sh/?compare_on=true&selected=p3.2xlarge,g3.8xlarge) * [Price comparison for c5.4xlarge vs. c5d.4xlarge](https://instances.vantage.sh/?compare_on=true&selected=c5.4xlarge,c5d.4xlarge) AMI that I'm planning on using as the new base AMI with included Tesla driver: https://aws.amazon.com/marketplace/pp/prodview-jrxucanuabmfm?qid=1622830809415&sr=0-2&ref_=srh_res_product_title#pdp-pricing Info about c5 instances can be found here: https://aws.amazon.com/ec2/instance-types/c5/ Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28913659 Pulled By: seemethere fbshipit-source-id: 11b4d332e82b078a6801b312dc4ace2928838fc8	2021-06-04 16:22:05 -07:00
Andrew Gu	bbf7eceaf0	Refactor c10d and dist aliases for torch.distributed (#59456 ) Summary: Overview: This consolidates `c10d` and `dist` to only `dist` as the alias for `torch.distributed` in `test_store.py`. Both aliases were used most likely due to incremental additions to the test file and not intentional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59456 Test Plan: ``` python test/distributed/test_store.py ``` Reviewed By: agolynski Differential Revision: D28910169 Pulled By: andwgu fbshipit-source-id: f830dead29e9de48aaf2845dfa5861c9cccec15d	2021-06-04 16:07:44 -07:00
Alexander Golynski	1183fa3817	Switch PG::Work to Future in default_comm_hooks.cpp (#59398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59398 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28876182 Pulled By: agolynski fbshipit-source-id: 9d8f09ffa2f40bb0fb25c626b52678a1597a797e	2021-06-04 15:27:13 -07:00
Nikita Shulga	aa27136e3c	Fix test_randperm_device_compatibility for 1 GPU (#59484 ) Summary: Do not try to create tensors on 2nd device if device_count() == 1 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59484 Reviewed By: ngimel Differential Revision: D28910673 Pulled By: malfet fbshipit-source-id: e3517f31a463dd049ce8a5155409b7b716c8df18	2021-06-04 14:41:06 -07:00
Will Constable	a7c8c56b7f	torchdeploy allow embedded cuda interp use without cuda (#59459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59459 For any binary that can be used both with and without cuda, it's better to allow just including the cuda flavor of the interpreter. The previous logic would fail in this case, as it only allows using the cuda flavor if torch::cuda::is_available() reports true. Now, we unconditionally allow the cuda flavor to be used if it's present. Test Plan: Added new unit test to exercise this scenario, ran locally on devvm without cuda. Reviewed By: dzhulgakov Differential Revision: D28902176 fbshipit-source-id: 5c7c90d84987848471bb6dd5318db15314e0b442	2021-06-04 14:37:39 -07:00
Adam Simpkins	aeb55225e0	[caffe2] add a basic implementation of run-time feature rollout checks (#59355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59355 Add a `CheckKnob()` function for doing run-time checks of feature roll-out knobs. This provides an API for safely controlling the roll-out of new functionality in the code. Test Plan: Included some basic unit tests. Reviewed By: voznesenskym Differential Revision: D26536430 fbshipit-source-id: 2e53234c6d9ce624848fc8b2c76f6833f344f48b	2021-06-04 14:34:41 -07:00
Rong Rong (AI Infra)	90ad0f316f	try fixing checkout dirty issue (#59450 ) Summary: Testing and see if checkout with submodules during build phase will help. tentatively address https://github.com/pytorch/pytorch/issues/58867. but since the repro is not reliable. we cant be sure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59450 Reviewed By: malfet Differential Revision: D28908537 Pulled By: walterddr fbshipit-source-id: 21ad1392a5066554b5c633f31616ab3e6541c54d	2021-06-04 14:31:43 -07:00
Rong Rong	c4349bfa84	[GHA] add upload binary size step (#58341 ) Summary: GHA upload working. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58341 Test Plan: Internal table pytorch_binary_size row for this PR: https://github.com/pytorch/pytorch/issues/58341 Reviewed By: agolynski Differential Revision: D28908549 Pulled By: walterddr fbshipit-source-id: 313e5b2c5ce2a47af3c37652612af922a68fd246	2021-06-04 14:17:13 -07:00
anjali411	3607478ecd	Conjugate View (#54987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987 Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype: Here's a summary of the changes in this PR: This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose). 1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor. 2. NEW API: a) `.conj()` -- now returning a view. b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory. c) `.conj_physical_()`, and `out=` variant d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0. e) `.resolve_conj_()` in-place version of (d) f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors. g) `view_as_real` -- existing function, but now errors out on conjugated tensors. 3. Conjugate Fallback a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor. b) This fallback is well equipped to handle the following cases: - functional operation e.g., `torch.sin(input)` - Mutable inputs and in-place operations e.g., `tensor.add_(2)` - out-of-place operation e.g., `torch.sin(input, out=out)` - Tensorlist input args - NOTE: Meta tensors don't work with conjugate fallback. 4. Autograd a) `resolve_conj()` is an identity function w.r.t. autograd b) Everything else works as expected. 5. Testing: a) All method_tests run with conjugate view tensors. b) OpInfo tests that run with conjugate views - test_variant_consistency_eager/jit - gradcheck, gradgradcheck - test_conj_views (that only run for `torch.cfloat` dtype) NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit. Follow up work: 1. conjugate view RFC 2. Add neg bit to re-enable view operation on conjugated tensors 3. Update linalg functions to call into specialized functions that fast path with the hermitian operation. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28227315 Pulled By: anjali411 fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f	2021-06-04 14:12:41 -07:00
Mikhail Zolotukhin	19985d6f84	[TensorExpr] Fix printing of Bool dtype. (#59328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59328 Before the change we printed: ``` aten_eq[0] = decltype(::c10::impl::ScalarTypeToCPPType< ::c10::ScalarType::Bool>::t)((targ_0[0])==(targ_1[0]) ? 1 : 0); ``` After the change we print: ``` aten_eq[0] = bool((targ_0[0])==(targ_1[0]) ? 1 : 0); ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28841011 Pulled By: ZolotukhinM fbshipit-source-id: 9f68dd47e14a7bc28156b56414c2d5c0aad6b2d4	2021-06-04 13:59:38 -07:00
Mikhail Zolotukhin	285b8a5252	[TensorExpr] Improve debug messages. (#59280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59280 Differential Revision: D28819779 D28819779 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 2eaa0b78fb309cccb0efe9025a5c3b039e717027	2021-06-04 13:59:36 -07:00
Mikhail Zolotukhin	d60efd8207	[TensorExpr] Fix handling of 0-dim tensors. (#59279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279 There were some issues with how we handle 0-dim cases in lowerings and also in how we generate reductions in that special case. This PR fixes those issues and reenables a bunch of tests. Differential Revision: D28819780 D28819780 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736	2021-06-04 13:58:15 -07:00
Scott Wolchok	dce8697aea	[PyTorch][vulkan] Unify convert as `vTensor& convert(const Tensor&)` (#59268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59268 There's no reason we can't give `convert` this signature: `Tensor::unsafeGetTensorImpl() cocnst ` returns a non-const TensorImpl pointer. (See https://github.com/zdevito/ATen/issues/27#issuecomment-330717839) ghstack-source-id: 130548716 Test Plan: CI Reviewed By: SS-JIA Differential Revision: D28811477 fbshipit-source-id: 269f58980c1f68b29d4be3cba4cd340299ce39af	2021-06-04 13:16:14 -07:00
Natalia Gimelshein	c99d6254fb	remove THCReduce.cuh (#59431 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/59431 Reviewed By: malfet Differential Revision: D28904504 Pulled By: ngimel fbshipit-source-id: 25d98b736d74d64fd20a40e0d9c773332f56cc30	2021-06-04 12:57:07 -07:00
Ilia Cherniavskii	780faf52ca	[profile] Clarify record_shapes=True docstring (#59469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59469 Clarify that using record_shapes=True may cause extra tensor copies. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28905089 Pulled By: ilia-cher fbshipit-source-id: 7642cb16f6697b6d255a2b82348d4c17486680d0	2021-06-04 12:01:35 -07:00
Peter Bell	b3ee645cbf	Migrate `_th_std_var` to ATen (#59258 ) Summary: Ref https://github.com/pytorch/pytorch/issues/49421 This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from gh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors. TH Results: ``` [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: --------------------------------------------------------- 8 \| 3.6 \| 3.8 \| 8.2 \| 1.2 80 \| 3.7 \| 3.8 \| 8.4 \| 1.2 800 \| 4.2 \| 4.3 \| 8.7 \| 1.2 8000 \| 9.0 \| 9.1 \| 11.2 \| 1.5 80000 \| 58.3 \| 59.0 \| 30.6 \| 4.2 800000 \| 546.9 \| 546.9 \| 183.4 \| 31.3 8000000 \| 5729.7 \| 5701.0 \| 6165.4 \| 484.1 ``` ATen results: ``` [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: --------------------------------------------------------- 8 \| 4.0 \| 4.0 \| 8.7 \| 1.2 80 \| 3.6 \| 3.8 \| 9.0 \| 1.2 800 \| 4.1 \| 4.3 \| 8.9 \| 1.2 8000 \| 8.9 \| 9.2 \| 10.6 \| 1.5 80000 \| 57.0 \| 57.4 \| 28.8 \| 4.3 800000 \| 526.9 \| 526.9 \| 178.3 \| 30.2 8000000 \| 5568.1 \| 5560.6 \| 6042.1 \| 453.2 [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 8 threads: --------------------------------------------------------- 8 \| 3.9 \| 3.8 \| 9.1 \| 1.2 80 \| 3.8 \| 3.9 \| 8.8 \| 1.2 800 \| 4.2 \| 4.3 \| 8.9 \| 1.3 8000 \| 9.0 \| 9.2 \| 10.4 \| 1.5 80000 \| 26.0 \| 26.8 \| 26.4 \| 4.4 800000 \| 92.9 \| 87.3 \| 72.1 \| 22.4 8000000 \| 793.5 \| 791.8 \| 5334.8 \| 115.1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59258 Reviewed By: jbschlosser Differential Revision: D28860353 Pulled By: ngimel fbshipit-source-id: 80c9fe1db84dbc864eeb1a319076c7aaff0a04e5	2021-06-04 11:58:12 -07:00
Rong Rong (AI Infra)	689a5edd0a	Revert D28326365: [pytorch][PR] Add `torch.cuda.streams.ExternalStream` Test Plan: revert-hammer Differential Revision: D28326365 (`d7ef9b73fb`) Original commit changeset: b67858c80339 fbshipit-source-id: 337588d40b96cf04e46e554fa481ae7fd4254478	2021-06-04 11:19:36 -07:00
Will Constable	3472f0c94d	Enable torch::deploy GPU tests in sandcastle (#59460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59460 Original commit changeset: 6e01a96d3746 Test Plan: Verify new tests run in sandcastle and existing CI is OK Reviewed By: H-Huang Differential Revision: D28900869 fbshipit-source-id: a8962ec48c66bba3b4b8f001ece7231953b29e82	2021-06-04 11:13:43 -07:00
Gary Miguel	ed993f3243	[CODEOWNERS] spandantiwari -> shubhambhokare1 (#59427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59427 Reviewed By: agolynski Differential Revision: D28902131 Pulled By: SplitInfinity fbshipit-source-id: 6a583c5087caf147f9033b73765b1dd3f59a405c	2021-06-04 11:06:55 -07:00
Freey0	e90caac676	Port gelu_backward to structured (#58665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58665 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28572527 Pulled By: ezyang fbshipit-source-id: 0cb286f20c5f91453594a7dfe39ae4e4d24a13a1	2021-06-04 11:06:54 -07:00
Freey0	153a96054b	Port gelu to structured (#58664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58664 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28572533 Pulled By: ezyang fbshipit-source-id: 8be00ecdcc224b516de28bf5f43ec308174053db	2021-06-04 11:06:52 -07:00
Freey0	5f824ef437	Port hardshrink to structured (#58663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58663 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28572531 Pulled By: ezyang fbshipit-source-id: 3fc8c33445adeae1789774fb6d8099278b93f8f8	2021-06-04 11:06:50 -07:00
Freey0	b4fa4c86f7	Port hardshrink_backward and softshrink_backward to structured (#58662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58662 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28572532 Pulled By: ezyang fbshipit-source-id: 8ecebc1090d884ee579f5d04a46f1e60a2dd978e	2021-06-04 11:05:44 -07:00
Elton Leander Pinto	2119efd234	`reflection_pad1d_backward`: Port to structured (#59103 ) Summary: Tracking Issue: https://github.com/pytorch/pytorch/issues/55070 Port `reflection_pad1d_backward` to structured kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59103 Test Plan: Pre-existing tests Reviewed By: jbschlosser Differential Revision: D28836043 Pulled By: ezyang fbshipit-source-id: 4c3b0880edf305896f540113dcab70c8af24253b	2021-06-04 10:23:53 -07:00
Hui Guo	a6bd6b9ca5	[NNC] Fix the uninitialized pointer in loopnest.fuse_loops (#59411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59411 Bug: the uninitialized For* caused a casting error in pybind11. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28882635 Pulled By: huiguoo fbshipit-source-id: e3f2b659bae94e9617936b1b2368157bed73c2fe	2021-06-04 10:04:34 -07:00
Saketh Are	aa06bc0731	OpInfo: minor fix in sample_inputs_diff (#59181 ) Summary: sample_inputs_diff constructs all five positional arguments for [diff ](https://pytorch.org/docs/stable/generated/torch.diff.html) but uses only the first three. This doesn't seem to be intentional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59181 Test Plan: This change expands coverage of diff's OpInfo sample inputs. Related tests still pass. Reviewed By: mruberry Differential Revision: D28878359 Pulled By: saketh-are fbshipit-source-id: 1466f6c6c341490885c85bc6271ad8b3bcdf3a3e	2021-06-04 09:53:31 -07:00
Aliaksandr Ivanou	b99523832b	Remove use_env from torch.distributed.run, clarify bc around that parameter in comment. (#59409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59409 Remove use_env from torch.distributed.run, and clarify bc around that parameter in comment. Test Plan: n/a Reviewed By: cbalioglu Differential Revision: D28876485 fbshipit-source-id: 5f10365968d204985ce517b83c392c688995d76e	2021-06-04 09:02:47 -07:00
Jeffrey Wan	4ae5764d47	Add is_inference to native functions (#58729 ) Summary: Adds `is_inference` as a native function w/ manual cpp bindings. Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729 Reviewed By: mruberry Differential Revision: D28874507 Pulled By: soulitzer fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7	2021-06-04 08:59:11 -07:00
Zhuojie Zhou	fa597ee17f	Fix torch.randperm for CUDA (#59352 ) Summary: Context https://github.com/pytorch/pytorch/issues/58545 The logic is that we are going to keep it consistent for both torch.randperm and torch.randint 1. Generators can have either a fully-specified or non-fully specified device 2. As long as the device type match with the result, we don't error out Pull Request resolved: https://github.com/pytorch/pytorch/pull/59352 Test Plan: ``` python test/test_tensor_creation_ops.py -k TestRandomTensorCreation ``` Reviewed By: ngimel Differential Revision: D28855920 Pulled By: zhouzhuojie fbshipit-source-id: f8141a2c4b2f177e1aa7baec6999b65916cba02c	2021-06-04 08:56:18 -07:00
Hong Xu	202b2c9fc2	Remove many unnecessary constructor calls of Vectorized<T> (#58875 ) Summary: Refresh https://github.com/pytorch/pytorch/issues/56241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58875 Reviewed By: mruberry Differential Revision: D28892034 Pulled By: ezyang fbshipit-source-id: 21074e45f29a780168852be5305420a3cc1148fc	2021-06-04 08:50:53 -07:00
Emilio Castillo	d7ef9b73fb	Add `torch.cuda.streams.ExternalStream` (#57781 ) Summary: This is required in https://github.com/pytorch/pytorch/pull/57110#issuecomment-828357947 We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: https://github.com/pytorch/pytorch/pull/57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91	2021-06-04 08:47:09 -07:00
Victor Quach	c769300301	Fix MaxPool default pad documentation (#59404 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59404 Reviewed By: albanD Differential Revision: D28879049 Pulled By: Varal7 fbshipit-source-id: 03a86cd347d53ac2d06028b3f213c5b5d5ab7e91	2021-06-04 08:32:03 -07:00
Andrew Gu	6d51a89778	Fix broken hyperlinks (#59425 ) Summary: Overview: A number of the hyperlinks in the [`CONTRIBUTING.md` file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken since they include an extraneous `/torch/`. This PR fixes those links. The files whose links are broken are - `ProcessGroupNCCL.hpp` - `Store.hpp` - `FileStore.hpp` - `TCPStore.hpp` - `PrefixStore.hpp` - `rref_impl.h` - `rref_context.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59425 Test Plan: The `CONTRIBUTING.md` file is at https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md. `ProcessGroupNCCL.hpp` should have link https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupGloo.hpp, which is equivalent to `../lib/c10d/ProcessGroupGloo.hpp`. `Store.hpp` should have link https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/Store.hpp, which is equivalent to `../lib/c10d/Store.hpp`. `FileStore.hpp` should have link https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/FileStore.hpp, which is equivalent to `../lib/c10d/FileStore.hpp`. `PrefixStore.hpp` should have link https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/PrefixStore.hpp, which is equivalent to `../lib/c10d/PrefixStore.hpp`. `rref_interface.h` should have link https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/rref_interface.h, which is equivalent to `../../aten/src/ATen/core/rref_interface.h`. `rref_context.h` should have link https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rref_context.h, which is equivalent to `../csrc/distributed/rpc/rref_context.h`. Reviewed By: mruberry Differential Revision: D28888188 Pulled By: andwgu fbshipit-source-id: 023219184d42284ea1cbfcf519c1b4277dd5a02b	2021-06-04 08:27:26 -07:00
Nikita Shulga	63956610a7	Search for static OpenBLAS compiled with OpenMP (#59428 ) Summary: Before that, only dynamically linked OpenBLAS compield with OpenMP could be found. Also get rid of hardcoded codepath for libgfortran.a in FindLAPACK.cmake Only affects aarch64 linux builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/59428 Reviewed By: agolynski Differential Revision: D28891314 Pulled By: malfet fbshipit-source-id: 5af55a14c85ac66551ad2805c5716bbefe8d55b2	2021-06-04 08:09:21 -07:00
Eli Uriegas	c7a3a13bab	.circleci: Disable USE_GOLD_LINKER for CUDA 10.2 (#59413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59413 For CUDA 10.2 builds linked with the gold linker we were observing crashes when exceptions were being raised Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28888054 Pulled By: seemethere fbshipit-source-id: f9b38147591721803ed3cac607510fe5bbc49d6d	2021-06-04 07:02:54 -07:00
Luca Wehrstedt	06ed658358	Merge TensorPipe's CPU and CUDA channel registry (#59375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59375 The CPU and CUDA channels used to be separate classes in TensorPipe, but they recently got merged in order to support cross-device-type channels. We used to need two separate registries in PyTorch, but now we can merge them. This simplifies some registration logic, and will help in future PRs. ghstack-source-id: 130583770 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28796427 fbshipit-source-id: b7db983293cbbddd1aedec6428de08d8944b0344	2021-06-04 06:53:49 -07:00
Luca Wehrstedt	c09beaaf4a	Remove LazyStreamContext (2 out of 2) (#59299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59299 After recent changes, LazyStreamContext had in fact always become eager, and was in fact equivalent to a vector of streams. So it makes more sense now to remove this abstraction and use a more self-descriptive type. This PR migrates the TensorPipe agent. The previous PR migrated the RequestCallback internals. ghstack-source-id: 130583773 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28789174 fbshipit-source-id: a27d2b1f40ab3cf2ac0dd946232fd0eecda6d450	2021-06-04 06:53:47 -07:00
Luca Wehrstedt	03a5c6ea99	Remove LazyStreamContext (1 out of 2) (#59298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59298 After recent changes, LazyStreamContext had in fact always become eager, and was in fact equivalent to a vector of streams. So it makes more sense now to remove this abstraction and use a more self-descriptive type. This PR migrates the RequestCallback internals. The next PR migrates the TensorPipe agent. ghstack-source-id: 130583774 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28789175 fbshipit-source-id: fa581a50f9a6a1e42c2ad8c808a9b099bea7433e	2021-06-04 06:53:46 -07:00
Luca Wehrstedt	3e7396f99d	Fix CUDA sync when switching streams in RPC tests (#59297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59297 PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it. Also, the usage of an Event can be simplified by using `s1.wait(s2)`. ghstack-source-id: 130583777 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28832902 fbshipit-source-id: cd4f40ff811fa1b0042deedda2456e22f33b92bd	2021-06-04 06:53:44 -07:00
Luca Wehrstedt	8f4cfaa9db	Fix race condition in TP agent (#58753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58753 TSAN was (rightfully!) detecting and complaining about a race due to the fact that upon init the TP agent exchanges the device maps between nodes using RPC requests (and by doing so it accesses the device maps) and then sets the reverse device maps (thus possibly modifying the set of devices). This resulted in a data race, i.e., simultaneously reading and writing the set of devices without synchronizing. One solution is to add a mutex around the devices, which works, but is "annoying". An alternative solution is to make the set of devices immutable (i.e., `const`). For that to work, we need to exchange the device maps without using RPC calls. We can do so using the process group that we need to create anyways. Since now there's a lot more logic in Python, I've moved (and restructured) all safety checks over there, and removed them from C++. ghstack-source-id: 130583775 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D28603754 fbshipit-source-id: 88533e65d72d1eb806dc41bec8d55def5082e290	2021-06-04 06:53:42 -07:00
Luca Wehrstedt	c0acffa6ef	Ensure async_execution works with CUDAFuture (#56863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56863 ghstack-source-id: 130583772 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27985908 fbshipit-source-id: 09469ee0eb70b8e3b61f6278f2c881ce7f5244d6	2021-06-04 06:53:40 -07:00
Luca Wehrstedt	7bcd8f94a5	Avoid re-doing CUDA stream sync in OwnerRRef (#57355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57355 We had started fixing OwnerRRef to make it CUDA-compatible, by properly synchronizing CUDA streams/events where appropriate. However, since we started using CUDAFuture (or, well, ivalue::Future nowadays, after they got merged) this is all done automatically for us, hence we can undo these "fixes" as they're now duplicated. ghstack-source-id: 130583771 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28118182 fbshipit-source-id: 4b1dd9fe88c23802b1df573941d1b73af48bb67b	2021-06-04 06:52:33 -07:00
Yi Wang	d009c9c129	[RPC Framework] Separate initialize_from_module_rref method out of RemoteModule constructor (#59292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59292 #Closes: https://github.com/pytorch/pytorch/issues/58274 Create an alternate initialization method, and also create a few util functions to avoid duplicate code. ghstack-source-id: 130575373 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_create_remote_module_from_module_rref Reviewed By: vipannalla Differential Revision: D28825895 fbshipit-source-id: 87803e94d9b50f94e1b7b2c99b9bf1634e20d065	2021-06-04 03:43:36 -07:00
Bert Maher	c3bf42e0d8	Fix symbolic derivative of hardswish (#59405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59405 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28879698 Pulled By: bertmaher fbshipit-source-id: 2f2d9836bf592b18ed9a19aab4f5967e653b5898	2021-06-03 23:12:18 -07:00
Bert Maher	9ac954789d	[nnc] Add hardsigmoid (#59069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59069 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28738166 Pulled By: bertmaher fbshipit-source-id: d9f5b87ef1f2323a3631add79c2670ce794f911e	2021-06-03 23:10:36 -07:00
Hui Guo	c717ce6771	[NNC] Add python bindings for Compute2 (#59350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59350 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28854806 Pulled By: huiguoo fbshipit-source-id: b9091f9183249257aedc1eafb1992e0faf5dea82	2021-06-03 22:37:08 -07:00
johnlu	db90533b9e	Make JIT not assume that the device is CUDA. (#54238 ) Summary: Decouple the JIT argument spec and shape analysis with CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54238 Reviewed By: ngimel Differential Revision: D28802085 Pulled By: Krovatkin fbshipit-source-id: 4068c9460cdec2d80733f001ca90ea3f5e6d3a7e	2021-06-03 22:21:27 -07:00
Hui Guo	7c4ac9e3ee	[NNC] Fix loopnest.cache_accesses for reduce ops (fixed #59002 ) (#59136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59136 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28768598 Pulled By: huiguoo fbshipit-source-id: 99ab8430bc0ba395e2a041b03a7761de335ddda5	2021-06-03 21:04:14 -07:00
Meghan Lele	d9d7d5e24a	[torch] Remove migration warning for ScriptDict Summary: This commit removes the warning that suggests that users script their dictionaries before passing them into TorchScript code. The ScriptDict feature is not fully ready, so it does not make sense to recommend this yet. Test Plan: Sandcastle. In addition, the PyPER test broken by the original diff passes: ``` buck test mode/opt //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt -- --exact 'caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt - caffe2.torch.fb.training_toolkit.backend.tests.test_model_materializer_full_sync_lwt.ModelMaterializerFullSyncLwtTest: test_materialization_determinism_cpu' --run-disabled ``` Differential Revision: D28891351 fbshipit-source-id: 2a3a00cde935d670fb1dc7fd8c709ae9c2ad8cdc	2021-06-03 20:55:40 -07:00
Hao Lu	6627c00e63	[Static Runtime] Fix bug in quantized::linear wrapper (#59407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59407 Reviewed By: ajyu Differential Revision: D28881307 fbshipit-source-id: 46c169f783cf05c585871c2e074d52255116b9c3	2021-06-03 19:18:04 -07:00
Hui Guo	7d38901e7c	[NNC] Fix BufHandle arguments in loopnest python API (#59348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59348 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28854233 Pulled By: huiguoo fbshipit-source-id: 2484249992903ed7af0de504ac27f96f30e993d1	2021-06-03 17:34:17 -07:00
Liang Luo	77de640f4b	[torch distributed] Implementing reduce_scatter_base (#57567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57567 Support flattened reduce_scatter. Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/torch/lib/c10d:ProcessGroupNCCLTest buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed:c10d Reviewed By: zhaojuanmao Differential Revision: D27876281 fbshipit-source-id: 58e2edfb1baff5cdc083dbaaba9f19502ef0b298	2021-06-03 17:17:53 -07:00
Bert Maher	46d724c919	Revert D28859795: [nnc] Enable CPU fusion inside Facebook, take 4 Test Plan: revert-hammer Differential Revision: D28859795 (`6baa66ece9`) Original commit changeset: 826801db24e8 fbshipit-source-id: c85a0fc7e88c95af939d5c0f50c0c8878e1174d3	2021-06-03 16:29:51 -07:00
Can Balioglu	526445dfa8	Update reviewer list for the distributed package (#59417 ) Summary: Added cbalioglu to the default reviewer list of the distributed package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59417 Reviewed By: mruberry Differential Revision: D28883997 Pulled By: cbalioglu fbshipit-source-id: 0ed9638f25bd914b71d96203579507af3b830df4	2021-06-03 15:38:07 -07:00
Nikita Shulga	aa4f27c12a	Prefer accurate reciprocal on ARMv8 (#59361 ) Summary: Default NEON accelerated implementation of reciprocal uses vrecpeq_f32 which yield Newton-Raphson approximation rather than actual value Use regular NEON accelerated division for reciprocal and reciprocal square root operations. This fixes `test_reference_numerics_hard_frac_cpu_float32`, `test_reference_numerics_normal_rsqrt_cpu_float32` etc Pull Request resolved: https://github.com/pytorch/pytorch/pull/59361 Reviewed By: mruberry Differential Revision: D28870456 Pulled By: malfet fbshipit-source-id: e634b0887cce7efb046ea1fd9b74424e0eceb164	2021-06-03 15:28:36 -07:00
Facebook Community Bot	3416b8dd70	Automated submodule update: FBGEMM (#59337 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9cb33bcfe5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59337 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: caogao Differential Revision: D28846199 fbshipit-source-id: b78f087129edef97247d4ceea77cfede0c6800fe	2021-06-03 14:45:32 -07:00
KAI ZHAO	1aa14fcb14	Fix the "tensors to be on the same device" error in HistogramObserver (#59234 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59075 This PR fixes the "tensors to be on the same device" error in `HistogramObserver`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234 Reviewed By: jbschlosser Differential Revision: D28837572 Pulled By: vkuzo fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216	2021-06-03 13:30:56 -07:00
Lishan Yang	2aa463d931	Support switching RemoteModule between train/eval (#59026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59026 #Closes: https://github.com/pytorch/pytorch/issues/51480 Enabled methods train and eval in RemoteModule to call the underlying train/eval methods on the actual nn.Module ghstack-source-id: 130421137 Test Plan: Call these two updated methods in method test_send_remote_module_over_the_wire in remote_module_test.py. To test the correctness, after running method train, the training mode should be set to True; after running method eval, the training mode of the remote module should be set to False. Related test output: ✓ Pass: caffe2/test/distributed/rpc:process_group_agent - test_send_remote_module_over_the_wire (fb.test_process_group_agent.ProcessGroupThreeWorkersRemoteModuleTestWithFork) (23.059) ✓ Pass: caffe2/test/distributed/rpc:thrift_agent - test_send_remote_module_over_the_wire (fb.test_thrift_agent.ThriftThreeWorkersRemoteModuleTestWithFork) (27.965) ✓ Pass: caffe2/test/distributed/rpc:process_group_agent - test_send_remote_module_over_the_wire (test_process_group_agent.ProcessGroupThreeWorkersRemoteModuleTestWithSpawn) (74.481) ✓ Pass: caffe2/test/distributed/rpc:thrift_agent - test_send_remote_module_over_the_wire (fb.test_thrift_agent.ThriftThreeWorkersRemoteModuleTestWithSpawn) (77.243) ✓ Pass: caffe2/test/distributed/rpc:tensorpipe_agent - test_send_remote_module_over_the_wire (fb.test_tensorpipe_agent.TensorPipeThreeWorkersRemoteModuleTestWithFork) (58.644) ✓ Pass: caffe2/test/distributed/rpc:tensorpipe_agent - test_send_remote_module_over_the_wire (test_tensorpipe_agent.TensorPipeThreeWorkersRemoteModuleTestWithSpawn) (90.229) Reviewed By: pritamdamania87, SciPioneer Differential Revision: D28721078 fbshipit-source-id: aa45c1e5755f583200144ecfec3704f28221972c	2021-06-03 13:13:58 -07:00
Howard Huang	c1c9774acb	Revert D28538996: Enable torch::deploy GPU tests in sandcastle Test Plan: revert-hammer Differential Revision: D28538996 (`4b74c848aa`) Original commit changeset: 1a6ccea07cfe fbshipit-source-id: 6e01a96d3746d3ca3e4e792a7b623ef960c9d2d6	2021-06-03 13:00:25 -07:00
Michael Wootton	e66015dadf	Add build support for kineto + rocm (#58401 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58399 CMake changes to allow kineto to build with rocm support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401 Reviewed By: mruberry Differential Revision: D28479807 Pulled By: walterddr fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7	2021-06-03 12:15:20 -07:00
Rohan Varma	332b01e93f	[DDP] log usage of torch_distributed_debug (#59351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59351 Logging PT distributed debug level to track usage internally. ghstack-source-id: 130443122 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28854914 fbshipit-source-id: a8e85ca4a3c9ac2f18d13190e87c0ebc4a8e7ea2	2021-06-03 11:49:23 -07:00
Peter Bell	6408cbd918	Migrate renorm to ATen (CPU and CUDA) (#59250 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/59108, closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616 This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot. #### Benchmarks (CPU): \| Shape \| Dim \| Before \| After (1 thread) \| After (8 threads) \| \|:------------:\|:---:\|--------:\|-----------------:\|------------------:\| \| (10, 10, 10) \| 0 \| 11.6 us \| 4.2 us \| 4.2 us \| \| \| 1 \| 14.3 us \| 5.2 us \| 5.2 us \| \| \| 2 \| 12.7 us \| 4.6 us \| 4.6 us \| \| (50, 50, 50) \| 0 \| 330 us \| 120 us \| 24.4 us \| \| \| 1 \| 350 us \| 135 us \| 28.2 us \| \| \| 2 \| 417 us \| 130 us \| 24.4 us \| #### Benchmarks (CUDA) \| Shape \| Dim \| Before \| After \| \|:------------:\|:---:\|--------:\|--------:\| \| (10, 10, 10) \| 0 \| 12.5 us \| 12.1 us \| \| \| 1 \| 13.1 us \| 12.2 us \| \| \| 2 \| 13.1 us \| 11.8 us \| \| (50, 50, 50) \| 0 \| 33.7 us \| 11.6 us \| \| \| 1 \| 36.5 us \| 15.8 us \| \| \| 2 \| 41.1 us \| 15 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59250 Reviewed By: mruberry Differential Revision: D28820359 Pulled By: ngimel fbshipit-source-id: 572486adabac8135d52a9b8700f9d145c2a4ed45	2021-06-03 11:43:27 -07:00
Andrew Gu	2ad4b8e58c	Extract c10d Store tests to dedicated test file (#59271 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/55340 Overview This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`. Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`. Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`. Testing All commands shown are run on an AI AWS cluster. I check the Store tests: ``` python test/distributed/test_store.py ``` I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`. ``` python test/distributed/test_c10d_common.py python test/distributed/test_c10d_nccl.py python test/distributed/test_c10d_gloo.py ``` `test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes. Testing Issues (To Be Revisited) ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py ``` Running the above command fails three tests (written as `[Test]`: `[Error]`): - `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580` - `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` - `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` However, running each of the following yields no errors: ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default ``` This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue). Similarly, running the following yields 47 errors: ``` WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py ``` The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines). Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`. Other Notes I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`. The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271 Reviewed By: jbschlosser, mrshenli Differential Revision: D28856920 Pulled By: andwgu fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446	2021-06-03 10:53:33 -07:00
Edward Yang	f05d5bec48	Preserve PyObject even when it goes dead (#56017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017 Fixes #55686 This patch is seemingly straightforward but some of the changes are very subtle. For the general algorithmic approach, please first read the quoted issue. Based on the algorithm, there are some fairly straightforward changes: - New boolean on TensorImpl tracking if we own the pyobj or not - PythonHooks virtual interface for requesting deallocation of pyobj when TensorImpl is being released and we own its pyobj, and implementation of the hooks in python_tensor.cpp - Modification of THPVariable to MaybeOwned its C++ tensor, directly using swolchok's nice new class And then, there is python_variable.cpp. Some of the changes follow the general algorithmic approach: - THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and initializes as owend (like before) - THPVariable_Wrap adds the logic for reverting ownership back to PyObject when we take out an owning reference to the Python object - THPVariable_dealloc attempts to resurrect the Python object if the C++ tensor is live, and otherwise does the same old implementation as before - THPVariable_tryResurrect implements the resurrection logic. It is modeled after CPython code so read the cited logic and see if it is faithfully replicated - THPVariable_clear is slightly updated for MaybeOwned and also to preserve the invariant that if owns_pyobj, then pyobj_ is not null. This change is slightly dodgy: the previous implementation has a comment mentioning that the pyobj nulling is required to ensure we don't try to reuse the dead pyobj. I don't think, in this new world, this is possible, because the invariant says that the pyobj only dies if the C++ object is dead too. But I still unset the field for safety. And then... there is THPVariableMetaType. colesbury explained in the issue why this is necessary: when destructing an object in Python, you start off by running the tp_dealloc of the subclass before moving up to the parent class (much in the same way C++ destructors work). The deallocation process for a vanilla Python-defined class does irreparable harm to the PyObject instance (e.g., the finalizers get run) making it no longer valid attempt to resurrect later in the tp_dealloc chain. (BTW, the fact that objects can resurrect but in an invalid state is one of the reasons why it's so frickin' hard to write correct __del__ implementations). So we need to make sure that we actually override the tp_dealloc of the bottom most subclass of Tensor to make sure we attempt a resurrection before we start finalizing. To do this, we need to define a metaclass for Tensor that can override tp_dealloc whenever we create a new subclass of Tensor. By the way, it was totally not documented how to create metaclasses in the C++ API, and it took a good bit of trial error to figure it out (and the answer is now immortalized in https://stackoverflow.com/q/67077317/23845 -- the things that I got wrong in earlier versions of the PR included setting tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on the metaclass--you want to leave it unset so that it inherits, and determining that tp_init is what actually gets called when you construct a class, not tp_call as another not-to-be-named StackOverflow question suggests). Aside: Ordinarily, adding a metaclass to a class is a user visible change, as it means that it is no longer valid to mixin another class with a different metaclass. However, because _C._TensorBase is a C extension object, it will typically conflict with most other metaclasses, so this is not BC breaking. The desired new behavior of a subclass tp_dealloc is to first test if we should resurrect, and otherwise do the same old behavior. In an initial implementation of this patch, I implemented this by saving the original tp_dealloc (which references subtype_dealloc, the "standard" dealloc for all Python defined classes) and invoking it. However, this results in an infinite loop, as it attempts to call the dealloc function of the base type, but incorrectly chooses subclass type (because it is not a subtype_dealloc, as we have overridden it; see `b38601d496/Objects/typeobject.c (L1261)` ) So, with great reluctance, I must duplicate the behavior of subtype_dealloc in our implementation. Note that this is not entirely unheard of in Python binding code; for example, Cython `c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)` also does similar things. This logic makes up the bulk of THPVariable_subclass_dealloc To review this, you should pull up the CPython copy of subtype_dealloc `b38601d496/Objects/typeobject.c (L1230)` and verify that I have specialized the implementation for our case appropriately. Among the simplifications I made: - I assume PyType_IS_GC, because I assume that Tensor subclasses are only ever done in Python and those classes are always subject to GC. (BTW, yes! This means I have broken anyone who has extend PyTorch tensor from C API directly. I'm going to guess no one has actually done this.) - I don't bother walking up the type bases to find the parent dealloc; I know it is always THPVariable_dealloc. Similarly, I can get rid of some parent type tests based on knowledge of how THPVariable_dealloc is defined - The CPython version calls some private APIs which I can't call, so I use the public PyObject_GC_UnTrack APIs. - I don't allow the finalizer of a Tensor to change its type (but more on this shortly) One alternative I discussed with colesbury was instead of copy pasting the subtype_dealloc, we could transmute the type of the object that was dying to turn it into a different object whose tp_dealloc is subtype_dealloc, so the stock subtype_dealloc would then be applicable. We decided this would be kind of weird and didn't do it that way. TODO: - More code comments - Figure out how not to increase the size of TensorImpl with the new bool field - Add some torture tests for the THPVariable_subclass_dealloc, e.g., involving subclasses of Tensors that do strange things with finalizers - Benchmark the impact of taking the GIL to release C++ side tensors (e.g., from autograd) - Benchmark the impact of adding a new metaclass to Tensor (probably will be done by separating out the metaclass change into its own change) - Benchmark the impact of changing THPVariable to conditionally own Tensor (as opposed to unconditionally owning it, as before) - Add tests that this actually indeed preserves the Python object Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27765125 Pulled By: ezyang fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a	2021-06-03 10:50:36 -07:00
Jerry Zhang	fa72d9a379	[quant] Fix use after free (#59267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59267 fixes: https://github.com/pytorch/pytorch/issues/58868 Test Plan: Imported from OSS Reviewed By: jbschlosser, supriyar Differential Revision: D28811529 fbshipit-source-id: f27018ae0a02d1dd229d1ff7638f130c38a00986	2021-06-03 10:35:48 -07:00
Bert Maher	6baa66ece9	[nnc] Enable CPU fusion inside Facebook, take 4 Summary: fixed the awkward configerator initialization issue that broke some tests. Trying again Test Plan: predictor comparisons Reviewed By: ZolotukhinM Differential Revision: D28859795 fbshipit-source-id: 826801db24e86b1c3594a86e3ac32f0a84c496f7	2021-06-03 09:33:13 -07:00
Mike Ruberry	57e452ff5d	Revert D28856713: [PyTorch Edge] Add proper error message when loading incompatible model with lite interpreter Test Plan: revert-hammer Differential Revision: D28856713 Original commit changeset: c3f9a3b64459 fbshipit-source-id: cc6ba8ec1047f29e62061107a2e5f245981b8039	2021-06-03 08:40:28 -07:00
kshitij12345	6620d7d688	OpInfo: norm (#59259 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 EDIT: ~~Test takes whooping 4 mins to run 😓~~ (Filtered tests also included linalg norm) Newly added tests take around 2 mins. ``` ==================================================== 193 passed, 224 skipped, 27224 deselected, 5 warnings in 138.87s (0:02:18) ==================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59259 Reviewed By: jbschlosser Differential Revision: D28833962 Pulled By: mruberry fbshipit-source-id: 40b24d6a8cb8b7d231b2f6b34b87cee4f136c5f9	2021-06-03 08:25:58 -07:00
Will Constable	4b74c848aa	Enable torch::deploy GPU tests in sandcastle Summary: Added GPU tests in previous diffs but had to disable them as they only pass locally on devgpu, but not in sandcastle. note: local testing requires mode/dev-nosan or else ASAN interferes with CUDA. Test Plan: Verify tests passing in sandcastle. Reviewed By: malfet Differential Revision: D28538996 fbshipit-source-id: 1a6ccea07cfe2f150eee068594e636add620cd91	2021-06-03 08:10:19 -07:00
Nikita Shulga	f1ce7f4b7f	Update PyTorch version to 0.10.0a (#59345 ) Summary: Also fix `TestProducerVersion` by removing assumption that major and minor are single digit Pull Request resolved: https://github.com/pytorch/pytorch/pull/59345 Reviewed By: robieta Differential Revision: D28853720 Pulled By: malfet fbshipit-source-id: 4b6d03c6b0c9d652a5aef792aaa84eaa522d10e8	2021-06-03 07:55:44 -07:00
Ailing Zhang	c829095590	Revert D28802058: [pytorch][PR] add dispatch for bitwise_and Test Plan: revert-hammer Differential Revision: D28802058 (`874f287c52`) Original commit changeset: cccbbff46df5 fbshipit-source-id: 1675fe42966278aa446496445342d6d8a92ecea0	2021-06-03 07:38:13 -07:00
albanD	d095ec75a1	Forward AD formulas batch 2 (#57863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57863 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387763 Pulled By: albanD fbshipit-source-id: e1b60ab728bb05b9e3323ee0dc7e401aaf5b8817	2021-06-03 07:33:04 -07:00
Bin Bao	add291cf66	[JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477 Currently the conversion only deals with activation operators. The legality check is somewhat strict for now. Test Plan: ``` python test/test_jit.py -k test_functional_to_inplace_activation python test/test_jit.py -k test_inplace_to_functional_activation ``` Reviewed By: mrshenli Differential Revision: D28155153 Pulled By: desertfire fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b	2021-06-03 06:43:23 -07:00
Chen Lai	91b7bcf4c0	[PyTorch Edge] Add proper error message when loading incompatible model with lite interpreter (#59354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59354 Check if the model has `bytecode.pkl` and provide proper error message before loading model. Test it by loading a model.pt and model.ptl. ``` >>> from torch.jit.mobile import _load_for_lite_interpreter >>> _load_for_lite_interpreter("/Users/chenlai/Documents/pytorch/data/mobilenet_v2.pt") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/chenlai/pytorch/torch/jit/mobile/__init__.py", line 48, in _load_for_lite_interpreter cpp_module = torch._C._load_for_lite_interpreter(f, map_location) # type: ignore[attr-defined] RuntimeError: The model is not generated from the api _save_for_lite_interpreter. Please regenerate the module by scripted_module._save_for_lite_interpreter('model.ptl'). Refer to https://pytorch.org/tutorials/prototype/lite_interpreter.html for more details. ``` iOS: ![image](https://user-images.githubusercontent.com/16430979/120593077-cbe23180-c3f3-11eb-9745-ee2b04b78c6c.png) Android: ![image](https://user-images.githubusercontent.com/16430979/120594357-af46f900-c3f5-11eb-9fb0-500a038148e3.png) Differential Revision: D28856713 D28856713 Test Plan: Imported from OSS Reviewed By: dhruvbird Pulled By: cccclai fbshipit-source-id: c3f9a3b64459dda6811d296371c8a2eaf22f8b20	2021-06-03 03:18:14 -07:00
Richard Barnes	3979cb0656	irange for size_t (#55320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27572577 fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03	2021-06-03 01:04:13 -07:00
Richard Barnes	f914ab193e	Use irange in a few places in torch/csrc (#55100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55100 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27447708 fbshipit-source-id: 4f21133bd76f29d73a51befcae649ab55637b36e	2021-06-03 00:58:51 -07:00
Jerry Zhang	18642e664a	[quant][graphmode][fx][refactor] Split quantize.py to prepare.py and convert.py (#59353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59353 Next: remove Quantizer class Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D28856277 fbshipit-source-id: 25f5502be387dbe9706780f667501b46b82789a5	2021-06-02 23:52:39 -07:00
Natalia Gimelshein	8b4784a9c6	Revert D28821216: [pytorch][PR] Migrate `_th_std_var` to ATen Test Plan: revert-hammer Differential Revision: D28821216 (`1fb5cf5a71`) Original commit changeset: f35992c21f08 fbshipit-source-id: d068a62b7fa941188591a74dcb5d1a24719af7b3	2021-06-02 21:18:26 -07:00
Rohan Varma	eb55b086b7	[DDP] Log some python-side errors (#59284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59284 Logs a few python-side errors to DDP logging. TODO: Most python errors actually have to do with user input correctness, so they throw before reducer is constructed and thus there is no logger. For this case, should we allow `logger` to be created optionally without a reducer, just for the purpose of logging errors, so that we can gain insight into these errors in scuba? ghstack-source-id: 130412973 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28820290 fbshipit-source-id: 610e5dba885b173c52351f7ab25c923edce639e0	2021-06-02 19:49:26 -07:00
Rohan Varma	79aeca0b00	[DDP] Log when errors happen (#59281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59281 Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. ghstack-source-id: 130412974 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28652717 fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589	2021-06-02 19:48:26 -07:00
Yi Wang	d2e03051e0	Fix fecher continue next after StopIterator (#59313 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59312 cc VitalyFedyunin dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/59313 Reviewed By: jbschlosser Differential Revision: D28837762 Pulled By: dzhulgakov fbshipit-source-id: 95cc29359aaba0f24ca169c5495ab5c6c53a0dce	2021-06-02 19:14:25 -07:00
Peter Bell	1fb5cf5a71	Migrate `_th_std_var` to ATen (#59258 ) Summary: Ref https://github.com/pytorch/pytorch/issues/49421 This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from gh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors. TH Results: ``` [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: --------------------------------------------------------- 8 \| 3.6 \| 3.8 \| 8.2 \| 1.2 80 \| 3.7 \| 3.8 \| 8.4 \| 1.2 800 \| 4.2 \| 4.3 \| 8.7 \| 1.2 8000 \| 9.0 \| 9.1 \| 11.2 \| 1.5 80000 \| 58.3 \| 59.0 \| 30.6 \| 4.2 800000 \| 546.9 \| 546.9 \| 183.4 \| 31.3 8000000 \| 5729.7 \| 5701.0 \| 6165.4 \| 484.1 ``` ATen results: ``` [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: --------------------------------------------------------- 8 \| 4.0 \| 4.0 \| 8.7 \| 1.2 80 \| 3.6 \| 3.8 \| 9.0 \| 1.2 800 \| 4.1 \| 4.3 \| 8.9 \| 1.2 8000 \| 8.9 \| 9.2 \| 10.6 \| 1.5 80000 \| 57.0 \| 57.4 \| 28.8 \| 4.3 800000 \| 526.9 \| 526.9 \| 178.3 \| 30.2 8000000 \| 5568.1 \| 5560.6 \| 6042.1 \| 453.2 [----------------------------- Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 8 threads: --------------------------------------------------------- 8 \| 3.9 \| 3.8 \| 9.1 \| 1.2 80 \| 3.8 \| 3.9 \| 8.8 \| 1.2 800 \| 4.2 \| 4.3 \| 8.9 \| 1.3 8000 \| 9.0 \| 9.2 \| 10.4 \| 1.5 80000 \| 26.0 \| 26.8 \| 26.4 \| 4.4 800000 \| 92.9 \| 87.3 \| 72.1 \| 22.4 8000000 \| 793.5 \| 791.8 \| 5334.8 \| 115.1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59258 Reviewed By: mruberry Differential Revision: D28821216 Pulled By: ngimel fbshipit-source-id: f35992c21f08a0a8878053680dc0ca7a8facd155	2021-06-02 19:01:39 -07:00
Rohan Varma	c03cae49fc	[DDP] Remove unused initialize_buckets (#59066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59066 Per title ghstack-source-id: 130338812 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D28734666 fbshipit-source-id: 89ca7f8e625c4068ba0ed9800be2619e469ae515	2021-06-02 17:20:22 -07:00
Rohan Varma	2a78e896a0	[DDP] use work.result() in _check_global_requires_backward_grad_sync (#59065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59065 Cleaner to use work.result() instead of sending back the tensor from this function. ghstack-source-id: 130338813 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28551203 fbshipit-source-id: d871fed78be91f0647687ea9d6fc86e576dc53a6	2021-06-02 17:19:07 -07:00
Meghan Lele	517ea26eee	[deploy] Make load_library a no-op inside a package (#58933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58933 Summary This commit makes load_library calls no-ops inside packages run with deploy. Libraries containing custom C++ operators and classes are statically linked in C++ and don't need to be loaded. This commit takes advantage of the fact that sys.executable is set to torch_deploy in deploy and uses that to exit early in load_library if the program is running inside deploy. Test Plan This commit adds a test to `generate_examples`/`test_deploy` that packages and runs a function that calls `load_library`. The library doesn't exist, but that's okay because the function should be a no-op anyway. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D28687159 Pulled By: SplitInfinity fbshipit-source-id: 4a61fc636698e44f204334e338c5ce35257e7ae2	2021-06-02 17:01:31 -07:00
Nikita Shulga	dfe85d6fd7	Revert D28840199: [pytorch][PR] Update version to 1.10 Test Plan: revert-hammer Differential Revision: D28840199 (`3453aa44c1`) Original commit changeset: acc5a93e12a3 fbshipit-source-id: a41eb7c882fe0bf8f9a35ef180e99a7e72f6857d	2021-06-02 16:25:51 -07:00
Richard Barnes	2ce23136d0	Use irange in torch/csrc utils (#55556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55556 Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D27625936 fbshipit-source-id: 79065438f582a6f5fe6f1f796b6984767605197e	2021-06-02 15:47:00 -07:00
Mehdi Drissi	e6c8e9497c	Small fix type hints in mobile optimizer (#59282 ) Summary: Adjusts type hints for optimize_for_mobile to be consistent with the default. Right now using optimize_for_mobile and only passing a script_module gives me a type error complaining about preserved_methods can't be None. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59282 Test Plan: Imported from GitHub, without a `Test Plan:` line. Open source tests ran the lints. Internal CI should be enough here. Reviewed By: jbschlosser Differential Revision: D28838159 Pulled By: JacobSzwejbka fbshipit-source-id: dd1e9aff00a759f71d32025d8c5b01e612c869a5	2021-06-02 15:32:16 -07:00
Shiyan Deng	318c858eb5	[fx2trt] Organize converters and add unittests (#59261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59261 Split converters to different files instead of putting them in a single file. Reviewed By: jackm321 Differential Revision: D28613989 fbshipit-source-id: f25ca3732c457af51a07ef466915a4a08bd45e6e	2021-06-02 15:22:15 -07:00
albanD	0eafef5031	Fix internal assert location in custom Function binding (#59301 ) Summary: For facebook employees, this fix some internal failures from https://www.internalfb.com/tasks/?t=92100671 This was not a problem before https://github.com/pytorch/pytorch/pull/58271 because these cycles used to just be leaked (so nothing was cleared/dealloced). Now that we properly clean up these cycles, we have to fix the assert in the clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59301 Reviewed By: jbschlosser Differential Revision: D28841564 Pulled By: albanD fbshipit-source-id: e2ec51f6abf44c4e3a83c293e90352295a43ba37	2021-06-02 15:09:51 -07:00
Howard Huang	c3745dc580	Small change for torch.distributed launcher (#59152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59152 Small change for https://fb.workplace.com/groups/319878845696681 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D28773682 Pulled By: H-Huang fbshipit-source-id: acf82273e8622b7ffd3088d8d766bdf49273754c	2021-06-02 15:05:41 -07:00
Nikita Shulga	3453aa44c1	Update version to 1.10 (#59325 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59325 Reviewed By: jbschlosser, seemethere Differential Revision: D28840199 Pulled By: malfet fbshipit-source-id: acc5a93e12a3db47d6103ea064bec9e40320f708	2021-06-02 15:00:33 -07:00
Howard Huang	7ee68363a8	Add new rpc.barrier API (#53423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53423 closes #40166 This change exposes a new API, rpc.barrier() which blocks the main processes of all workers running RPC until the whole group completes this function. Optionally rpc.barrier can take in a set of worker_names and only synchronize across those worker names. Example: ```python import os import torch.multiprocessing as mp import torch.distributed.rpc as rpc os.environ["MASTER_ADDR"] = "localhost" os.environ["MASTER_PORT"] = "5678" world_size = 4 odd_num_workers = [f"worker{i}" for i in range(world_size) if i % 2] even_num_workers = [f"worker{i}" for i in range(world_size) if not i % 2] def worker(i): print(i) rpc.init_rpc(f"worker{i}", rank=i, world_size=world_size) if i % 2: print(f"start barrier {i}") rpc.barrier(set(odd_num_workers)) else: print(f"start barrier {i}") rpc.barrier(set(even_num_workers)) rpc.shutdown() print(f"shutdown{i}") if __name__ == '__main__': with mp.Pool(processes=world_size) as pool: pool.map(worker, range(world_size)) ``` Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D27737145 Pulled By: H-Huang fbshipit-source-id: 369196bc62446f506d1fb6a3fa5bebcb0b09da9f	2021-06-02 14:20:16 -07:00
Tao Xu	1765f51618	[iOS GPU] [BE] use channel-last to transform the weights (#59113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59113 Manually permuting the weights is slower than `calling at::contiguous()` ghstack-source-id: 130374487 Test Plan: CI Reviewed By: SS-JIA Differential Revision: D28762278 fbshipit-source-id: 1dde3ef82018bc2507d0ca5132b1ee97dc99787f	2021-06-02 14:02:11 -07:00
Rohan Varma	1968efa2dd	[c10d] Remove verbose log (#59070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59070 This log is too verbose, especially in the case we call monitored barrier before every collective as we do in ProcessGroupWrapper. ghstack-source-id: 130052822 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28738189 fbshipit-source-id: f2899537caa4c13508da31134d5dd0f4fd6a1f3a	2021-06-02 13:50:11 -07:00
Thomas J. Fan	7f2e620105	FIX Validates that weights are 2d in embedding (#59314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59314 Reviewed By: H-Huang Differential Revision: D28837753 Pulled By: jbschlosser fbshipit-source-id: 683378244c61b0937c95563f91ef87ab09fd1653	2021-06-02 12:52:21 -07:00
driazati	fb709a8ca5	Build with USE_GLOO_WITH_OPENSSL=1 (#59274 ) (#59323 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59323 Reviewed By: jbschlosser Differential Revision: D28839920 Pulled By: malfet fbshipit-source-id: 63cffa6fe25cf354966354e5dd5490ba6e5b3d11	2021-06-02 12:51:00 -07:00
Nikita Shulga	f7097b0c0b	Make unary tests runnable if SCIPY is not installed (#59304 ) Summary: By adding `if TEST_SCIPY else _NOTHING` to special.i1 and special.i1e Discovered while running tests on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59304 Reviewed By: jbschlosser Differential Revision: D28835693 Pulled By: malfet fbshipit-source-id: e4fde6584da29fa43bc6da75eebe560512754ed0	2021-06-02 12:47:30 -07:00
Nikita Shulga	eae84f0d5d	Fix ONNX forward compatibility (#59327 ) Summary: Fixes `onnx.utils.polish_model` not found exception when executed using onnx-1.9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59327 Reviewed By: H-Huang Differential Revision: D28840563 Pulled By: malfet fbshipit-source-id: 403a29a88e7dee8b3414602b9fe2b31baf737dce	2021-06-02 12:39:56 -07:00
Aliaksandr Ivanou	c22ac14969	[Error-reporting] Set upper boundary on border element (#59311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59311 The diff sets the upper boundary on border element when presenting the error message. This is required in order to avoid unnecessary log contamination Test Plan: Example of log contamination: https://www.internalfb.com/fblearner/details/276849996/operator/2942475685?tab=try_27021599785797968 Reviewed By: d4l3k Differential Revision: D28812745 fbshipit-source-id: 4f491b9acc8cc9831d763f185022879bbbfb4c8a	2021-06-02 12:28:54 -07:00
Peter Bell	99f2000a99	Migrate nonzero from TH to ATen (CPU) (#59149 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2150 us \| 551 us \| \| 128,128,32 \| 1250 us \| 1020 us \| 197 us \| \| 64,128,32 \| 581 us \| 495 us \| 99 us \| \| 32,128,32 \| 292 us \| 255 us \| 83 us \| \| 16,128,32 \| 147 us \| 126 us \| 75 us \| \| 8,128,32 \| 75 us \| 65 us \| 65 us \| \| 4,128,32 \| 39 us \| 33 us \| 33 us \| \| 2,128,32 \| 20 us \| 18 us \| 18 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732	2021-06-02 12:26:29 -07:00
Scott Wolchok	b4d30bb583	[PyTorch] Use expect_contiguous in CPU matmul (#58895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58895 There doesn't seem to be any reason we can't use expect_contiguous here. ghstack-source-id: 130283300 Test Plan: CI Reviewed By: ngimel Differential Revision: D28666399 fbshipit-source-id: b4a9bcb01ff1c30d991765140c8df34c3ac3a89b	2021-06-02 12:04:18 -07:00
Tao Xu	0528325b5f	[iOS GPU] Raise the minimum OS support version to 11.0 (#59310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59310 We recently updated the GK to deliver GPU models to only 11.0+ devices. Will do a clean up in following diffs to clean up shader functions written for iOS 10.0. ghstack-source-id: 130374598 Test Plan: CI Reviewed By: linbinyu Differential Revision: D28805864 fbshipit-source-id: 4cde34ff9fbbe811a69686a0f29b56d69aeefbee	2021-06-02 11:53:45 -07:00
Tao Xu	f8f06e7099	[iOS GPU] Fix the OSS macos build (#59102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59102 ghstack-source-id: 130374334 Test Plan: - On the OSS side - CI - `USE_PYTORCH_METAL=ON python setup.py install --cmake` Reviewed By: IvanKobzarev Differential Revision: D28757412 fbshipit-source-id: 2efea9dfe7361a73c02d1ca5fbf587835d39d325	2021-06-02 11:47:11 -07:00
雾雨魔理沙	874f287c52	add dispatch for bitwise_and (#59125 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/59125 Reviewed By: ngimel Differential Revision: D28802058 Pulled By: ezyang fbshipit-source-id: cccbbff46df552235072fa38fea1c19b068991ea	2021-06-02 11:42:49 -07:00
Meghan Lele	484d53f4a0	[torch][JIT] Warn only once when using unscripted dictionary (#59287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59287 D27211605 added a warning in `toIValue` that warns users to script their dictionaries before passing them to TorchScript functions in order to get some performance benefits and reference semantics. However, this warning is emitted every time `toIValue` is called (e.g. when a dictionary is passed to TorchScript function), which can lead to noisy log output. This diff changes this changes to use `TORCH_WARN_ONCE` instead. Test Plan: Sandcastle, OSS CI. Reviewed By: hyuen Differential Revision: D28824468 fbshipit-source-id: e651eade4380abaf77c6c8a81ec4e565b0c2c714	2021-06-02 11:41:37 -07:00
Ivan Kobzarev	82052b0a76	[vulkan] Remove constant duplication for Vulkan optimize_for_mobile (#59276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59276 Test Plan: Imported from OSS Reviewed By: cccclai, ngimel Differential Revision: D28814072 Pulled By: IvanKobzarev fbshipit-source-id: d5cfd1352a2e07cdd4708d19fe4320444521db78	2021-06-02 11:38:18 -07:00
Eli Uriegas	3ec0904718	docs: Add note about nightly versions bump (#59324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59324 Also updates section on pinning pytorch/builder with an example [skip ci] Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28840049 Pulled By: seemethere fbshipit-source-id: e5d6722713680e969893d9df97ec269fc9c00411	2021-06-02 11:29:41 -07:00
Freey0	5386f6935a	avg_pool3d: port to structured (#59083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59083 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28802620 Pulled By: ezyang fbshipit-source-id: 1e890af3c37912447198aa2f20914b99decda8b2	2021-06-02 11:29:39 -07:00
Freey0	5dc426a6f6	avg_pool2d_backward: Port to structured (#59082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59082 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28802621 Pulled By: ezyang fbshipit-source-id: 15b8ba562eee132ef8390a7de520bdd8e15d0f86	2021-06-02 11:28:25 -07:00
Eli Uriegas	eb1adc4c5e	cmake: Add USE_GLOO_WITH_OPENSSL to Summary.cmake (#59321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59321 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28839370 Pulled By: seemethere fbshipit-source-id: 0d4b35c05c2b1a78b752088cd16cd6263958e7f6	2021-06-02 11:10:55 -07:00
Bert Maher	afd5237a4f	Revert D28800692: [nnc] Enable CPU fusion inside Facebook, take 3 Test Plan: revert-hammer Differential Revision: D28800692 (`6e7dae9cec`) Original commit changeset: d791c3b2ccd7 fbshipit-source-id: 5042fecfbab59181572013bf39760bc716e86430	2021-06-02 10:07:46 -07:00
Johannes Czech	a7aeaaf99e	Added missing namespaces for C++ API (#45736 ) Summary: Hello, depending on the build environment you may encounter ```c++ error: reference to 'optional' is ambiguous ``` when using the Torch-C++-API. This PR adds `c10::` to avoid possible ambiguities with std::optional and does not introduce any functional change. Fixes https://discuss.pytorch.org/t/linker-failed-with-ambiguous-references/36255 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45736 Reviewed By: dzhulgakov Differential Revision: D24125123 Pulled By: VitalyFedyunin fbshipit-source-id: df21420f0a2d0270227c28976a7a4218315cc107	2021-06-02 09:46:20 -07:00
Jerry Zhang	87a25e09f4	[quant][graphmode][fx][refactor] Remove _convert from Quantizer class (#59042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59042 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724867 fbshipit-source-id: 9f87d51020caa20d5408cb2820947e23d92d5fc3	2021-06-02 08:50:56 -07:00
Venkata Chintapalli	580831bfbb	Add support for MatMul to BatchMatMulFP16Acc{16,32}Fake Op Mapping Test Plan: f276981395 Reviewed By: hx89 Differential Revision: D28815646 fbshipit-source-id: c16b081bf3da2b157b9d42ea67b03dae88e82c6d	2021-06-02 08:32:21 -07:00
neginraoof	599f5058cf	[ONNX] Update ONNX to rel-1.9 (#55889 ) (#57080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080 ONNX optimizer is removed in ONNX 1.9 This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28467330 Pulled By: malfet fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568 Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-06-02 08:27:17 -07:00
Eli Uriegas	f87aa23125	.github: Remove windows dependency installs (#59283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59283 We were observing 403s when attempting to install dependencies from chocolatey leading us to believe that we were getting rate limited from chocolatey. We've instead opted to install our dependencies in our base AMIs instead considering we would install them on every workflow anyway. This also comes with the moving of the windows 10 sdk installation to the base sdk as well since we were observing failures there as well due to failed dependency installations. Also moves windows 10 sdk installations to our visual studio installation script, which is activated by an passing an environment variable Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D28822962 Pulled By: seemethere fbshipit-source-id: b5e35ffe4537db55deb027376bd2d418683707a5	2021-06-02 08:16:21 -07:00
Luca Wehrstedt	3a2149a4ce	[reland] Make TP agent use streams from Future when sending response (#59212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59212 Reland of https://github.com/pytorch/pytorch/pull/58428 Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!). ghstack-source-id: 130202842 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623885 fbshipit-source-id: 29333bcb75d077ab801eac92017d0e381e8f5569	2021-06-02 05:46:05 -07:00
Luca Wehrstedt	258a991027	[reland] Set and propagate devices in RRef completion future (#59211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59211 Reland of https://github.com/pytorch/pytorch/pull/58674 I found this missing parameter while debugging failures in the next PR. I'm very unhappy about this change. I think this future, which we know for sure won't contain tensors, shouldn't have to worry about CUDA devices. And yet, it does. This means that basically any future anywhere might have to worry about it, and this just doesn't scale, and thus it's bad. ghstack-source-id: 130202843 Test Plan: Should fix the next diff. Reviewed By: mrshenli Differential Revision: D28623886 fbshipit-source-id: 6c82ed7c785ac3bf32fff7eec67cdd73b96aff28	2021-06-02 05:46:04 -07:00
Luca Wehrstedt	a3392cafe0	[reland] Set streams when invoking UDFs (#59210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59210 Reland of https://github.com/pytorch/pytorch/pull/58427 Running the UDF (be it Python or JIT) is the first step of (most?) RPC calls, which is where the inputs are consumed. The lazy stream context contains the streams used by the inputs, thus it must be made current before any UDF call. I opt to do this as "close" as possible to the place the UDF is invoked, to make the relationship as explicit as possible. ghstack-source-id: 130202847 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623889 fbshipit-source-id: ed38242f813dac075d162685d52ae89f408932f9	2021-06-02 05:46:02 -07:00
Luca Wehrstedt	f8a3fd4e34	[reland] Create CUDA-aware futures in RequestCallback (#59209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59209 Reland of https://github.com/pytorch/pytorch/pull/58426 The operations in RequestCallback can return CUDA tensors, thus the futures used to hold them must be CUDA-aware. ghstack-source-id: 130202844 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623887 fbshipit-source-id: 53561b8ae011458d8f848f0a03830925aff2f0c2	2021-06-02 05:46:00 -07:00
Luca Wehrstedt	3af6ff98ff	[reland] Provide pre-extracted DataPtrs when completing a Future with a Message (#59208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59208 Reland of https://github.com/pytorch/pytorch/pull/58425 Now that callbacks can provide pre-extracted DataPtrs, let's do so. This will become of crucial importance in the next PR, where some of these futures will become CUDA-aware, and thus they will try to extract DataPtrs on their own, but they would fail to do so here because Message isn't "inspectable". ghstack-source-id: 130202845 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623888 fbshipit-source-id: 1aa4bde8014870c071685ba8f72d5f3f01f0a512	2021-06-02 05:45:59 -07:00
Luca Wehrstedt	1adc289e10	[reland] Allow Future::then to return pre-extracted DataPtrs (#59207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59207 Reland of https://github.com/pytorch/pytorch/pull/58424 In CUDA mode, Future must inspect its value and extract DataPtrs. However some types are not supported, for example the C++/JIT custom classes, which include Message, which is widely used in RPC. Hence for these scenarios we allow the user to perform the custom DataPtr extraction on their own, and pass the pre-extracted DataPtrs. Note that `markCompleted` already allowed users to pass in pre-extracted DataPtrs, hence this PR simply extends this possibility to the `then` method too. ghstack-source-id: 130202846 Test Plan: Used in next PR. Reviewed By: mrshenli Differential Revision: D28623890 fbshipit-source-id: 468c5308b40774ba0a778b195add0e0845c1929e	2021-06-02 05:45:57 -07:00
Luca Wehrstedt	b07d68e24c	[reland] Always use intrusive_ptr for Message (2 out of 2) (#59206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59206 Reland of https://github.com/pytorch/pytorch/pull/58423 This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging. ghstack-source-id: 130202848 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623892 fbshipit-source-id: f815cf6b93e488c118e5d2298473e6e9d9f4c132	2021-06-02 05:45:55 -07:00
Luca Wehrstedt	5ec169b4c3	[reland] Always use intrusive_ptr for Message (1 out of 2) (#59205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59205 Reland of https://github.com/pytorch/pytorch/pull/58422 Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move). By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above. In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR. ghstack-source-id: 130202849 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28623891 fbshipit-source-id: c9aeea3440679a11741ca78c06b03c57cb815a5e	2021-06-02 05:44:49 -07:00
Kushashwa Ravi Shrimali	44c20ce676	Alias for `i0` to `special` namespace (#59141 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59141 Reviewed By: ngimel Differential Revision: D28784097 Pulled By: mruberry fbshipit-source-id: 9b61a21906ef337292686fd40e328502a79e6f09	2021-06-01 23:04:09 -07:00
driazati	059a717c9e	Fix breakpad build and add to more images (#59236 ) Summary: This PR * adds the breakpad build to most of the remaining docker images (except the mobile + slim ones) * pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers * renames the API to be nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236 Reviewed By: malfet Differential Revision: D28792511 Pulled By: driazati fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2	2021-06-01 22:47:14 -07:00
Yi Wang	dbe629c51d	[RPC Framework] Support creating a RemoteModule by RRef (#59242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59242 #Oringal PR Issue: https://github.com/pytorch/pytorch/issues/58274 This can be a workaround: Instead of passing a script `RemoteModule` over RPC, pass its `module_rref` field over RPC, and then construct a new `RemoteModule` on the receiver end. ghstack-source-id: 130268018 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire_script_not_supported buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_remote_module_py_pickle_not_supported_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_create_remote_module_by_module_rref Reviewed By: vipannalla Differential Revision: D28794905 fbshipit-source-id: 1a677ff0d4b47c078ad47b50d7102a198a1fc39b	2021-06-01 22:35:03 -07:00
Jerry Zhang	3218d890dd	[quant][graphmode][fx][fix] Fix support for custom module (#59041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041 Static quantization for Custom module support was removed in a previous refactor https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case This PR re-enabled the test case and fixed the support Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D28724866 fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b	2021-06-01 22:31:15 -07:00
Jerry Zhang	06af7618e7	[quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724870 fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a	2021-06-01 22:00:49 -07:00
Philip Meier	0a26781966	fix numpy compatibility in test for `torch.kthvalue` (#59214 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59201. Should be merged after https://github.com/pytorch/pytorch/issues/59067 to ensure this actually working correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59214 Reviewed By: albanD Differential Revision: D28792363 Pulled By: mruberry fbshipit-source-id: 0cf613463139352906fb567f1efcc582c2c25de8	2021-06-01 21:57:09 -07:00
Ivan Yashchuk	e9e1bb1a4e	Fix device of info tensor for torch.linalg.inv_ex with MAGMA backend (#59223 ) Summary: This PR fixes `torch.linalg.inv_ex` with MAGMA backend. `info` tensor was returned on CPU device even for CUDA inputs. Now it's on the same device as input. Fixes https://github.com/pytorch/pytorch/issues/58769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223 Reviewed By: ngimel Differential Revision: D28814876 Pulled By: mruberry fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65	2021-06-01 21:49:57 -07:00
Jerry Zhang	50e6ee3ca2	[quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724874 fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567	2021-06-01 21:40:08 -07:00
Alexander	2d8f0d966f	CUDA support in the CSR layout: CUDA addmm/matvec (#59012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59012 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28719631 Pulled By: bhosmer fbshipit-source-id: 43e2004a61e114aeb0a7c6ad8a25fedda238c6da	2021-06-01 21:16:42 -07:00
Michael Carilli	3efefc4016	[CUDA graphs] Makes sure all graphs tests call empty_cache() at some point before capture (#59233 ) Summary: Graphs tests are sometimes flaky in CI ([example](https://app.circleci.com/pipelines/github/pytorch/pytorch/328930/workflows/0311199b-a0be-4802-a286-cf1e73f96c70/jobs/13793451)) because when the GPU runs near its max memory capacity (which is not unusual during a long test), sometimes, to satisfy new allocations that don't match any existing unused blocks, the caching allocator may call `synchronize_and_free_events` to wait on block end-of-life events and cudaFree unused blocks, then re-cudaMalloc a new block. For ungraphed ops this isn't a problem, but synchronizing or calling cudaFree while capturing is illegal, so `synchronize_and_free_events` raises an error if called during capture. The graphs tests themselves don't use much memory, so calling torch.cuda.empty_cache() at some point before their captures should ensure memory is available and the captures never need `synchronize_and_free_events`. I was already calling empty_cache() near the beginning of several graphs tests. This PR extends it to the ones I forgot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59233 Reviewed By: mruberry Differential Revision: D28816691 Pulled By: ngimel fbshipit-source-id: 5cd83e48e43b1107daed5cfa2efff0fdb4f99dff	2021-06-01 21:05:46 -07:00
Jerry Zhang	1d37f41567	[quant][graphmode][fx][refactor] Remove _prepare from Quantizer class (#59038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59038 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724869 fbshipit-source-id: e8501c9720b5ddb654e78bc8fa08de0466c1d52b	2021-06-01 18:01:22 -07:00
Richard Zou	970096b624	[Reland] Adds an aten::_ops namespace with unambiguous function names (#59018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59018 Fixes #58044. This PR: - adds `ATEN_FN(op)` and `ATEN_FN2(op, overload)` macros that resolve to an non-overloaded function in aten::_ops that calls the desired operator (without default arguments). The motivation for this is two-fold: 1) Using aten operators with templates is hard if the operator is overloaded (e.g. add.Tensor and add.Scalar). 2) Method-only operators require special handling; pointers-to-method are different from function pointers. `ATEN_FN2(add_, Tensor)` returns a function instead of a method. There is some interesting behavior for out= operations. `ATEN_FN2(sin, "out")` gives a function that is faithful to the schema; that is, the order of arguments is exactly what it looks like in the schema. This makes it so that you can directly register `ATEN_FN2(sin,"out")` (or a function wrapping it using the same signature) as an override for a DispatchKey. Test Plan: - New tests that ATEN_FN2 works on function and method-only operators - New test that ATEN_FN works - New test that ATEN_FN macro returns a "faithful" function. Codegen output: Operators.h and Operators.cpp are both here: https://gist.github.com/zou3519/c2c6a900410b571f0d7d127019ca5175 Reviewed By: bdhirsh Differential Revision: D28721206 Pulled By: zou3519 fbshipit-source-id: a070017f98e8f4038cb0c64be315eef45d264217	2021-06-01 17:19:06 -07:00
Yu Guo	8805093ec5	use long index type for index_add_cuda deterministic path (#59254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59254 index_add can take int or long index tensor whereas index_put only takes long indices tensor. In the deterministic path of index_add_cuda, we use index_put. Hence we better convert index tensor to long. Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (14.748) ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.717) ✓ Pass: caffe2/test:torch_cuda - main (27.717) Reviewed By: ngimel Differential Revision: D28804038 fbshipit-source-id: de12932a7738f2805f3bceb3ec024497625bce6a	2021-06-01 16:28:18 -07:00
Jerry Zhang	20348fb32e	[quant][graphmode][fx][refactor] Remove find_matches from Quantizer class (#59037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724865 fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08	2021-06-01 16:07:07 -07:00
Jerry Zhang	7d64fc675b	[quant][graphmode][fx][refactor] Remove fold_weights from Quantizer class (#59036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59036 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724862 fbshipit-source-id: 5900420127fcc14846bc34c9ac29ff7e6a703f1e	2021-06-01 15:52:57 -07:00
Thomas J. Fan	8af6281201	DOC Adds register_module_full_backward_hook into docs (#58954 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54443 Adds `register_module_full_backward_hook` into the index so it is rendered in the html docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58954 Reviewed By: ngimel Differential Revision: D28801816 Pulled By: jbschlosser fbshipit-source-id: a2e737fe983e5d7e4e26d7639183bca34b571cb8	2021-06-01 15:47:10 -07:00
Bert Maher	6e7dae9cec	[nnc] Enable CPU fusion inside Facebook, take 3 (#59253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59253 Fixed a miscompilation exposed by multithreaded profiling collection; let's try again. ghstack-source-id: 130286580 Test Plan: servicelab Reviewed By: navahgar, huiguoo Differential Revision: D28800692 fbshipit-source-id: d791c3b2ccd75fe5e6eca0859083d4cd67460147	2021-06-01 15:42:22 -07:00
Jerry Zhang	cc4891804c	[quant][graphmode][fx][refactor] Remove save_state and restore_state from Quantizer class (#59035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59035 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724872 fbshipit-source-id: d32752c635917c9820e5e7cc414ba9d48a258a19	2021-06-01 15:38:36 -07:00
Elton Chen-Yu Ho	336ac9496f	Fix mismatch in README.md Docker Image section (#59199 ) Summary: docker.Makefile has CUDNN_VERSION=8 as the defaults, but README.md states cuDNN v7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59199 Reviewed By: mruberry Differential Revision: D28808611 Pulled By: ngimel fbshipit-source-id: 96cea32bfe33184b2bff69b7bb7f3e50a2b9c6aa	2021-06-01 15:22:30 -07:00
Jagadish Krishnamoorthy	95c26b2806	[ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158 ) Summary: Disabling the test since its failing in ROCm4.2 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158 Reviewed By: mruberry Differential Revision: D28808953 Pulled By: ngimel fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224	2021-06-01 15:10:06 -07:00
Jerry Zhang	3d521e8b40	[quant][graphmode][fx][refactor] Remove prepare_custom_config from Quantizer class (#59034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59034 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724873 fbshipit-source-id: 870e0822843ad1d035f41eaa015bdde9ccf6ec23	2021-06-01 14:52:22 -07:00
Rohan Varma	a5dcd3c4b7	Revert D28240105: [pytorch][PR] Fix DistributedSampler mem usage on large datasets Test Plan: revert-hammer Differential Revision: D28240105 (`a0ce8da26e`) Original commit changeset: 4c6aa493d0f7 fbshipit-source-id: 8a0e17764c2f26c8316f88ad6c8772b08883ceee	2021-06-01 14:44:23 -07:00
Andrew McCollum	a0ce8da26e	Fix DistributedSampler mem usage on large datasets (#51841 ) Summary: The current implementation of DistributedSampler generates a python list to hold all of the indices, and then returns a slice of this list for the given rank (creating a partial copy of the list). When the underlying dataset is large, both of these choices waste a large amount of memory. It is much more efficient to create a tensor to hold the indices, and then index into that tensor instead of creating slices. In the case of a sampler with `shuffle=False`, it would be possible to avoid creating the `indices` tensor entirely (since the index will always match the value), but I have opted instead here to keep the implementation as similar to the existing version as possible. One possible benefit of this approach is that memory usage will not significantly change based on changing this parameter. Still, it might be better to simply return the indices directly without the underlying array. Additionally, the logic around calculating the number of samples is unnecessarily complex. When dropping the last batch, this can be a simple floor division. In a simple test script which creates a sampler for a dataset with a 100,000,000 items, memory usage is reduced 98% compared to the existing implementation. Fixes https://github.com/pytorch/pytorch/issues/45427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51841 Reviewed By: albanD Differential Revision: D28240105 Pulled By: rohan-varma fbshipit-source-id: 4c6aa493d0f75c07ec14c98791b3a531300fb1db	2021-06-01 14:15:14 -07:00
Andrew Gu	5a42a97c49	Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57878. This adds `NCCL_ASYNC_ERROR_HANDLING` as a DDP relevant environment variable and includes a check for that variable in the test `test_dump_DDP_relevant_env_vars()`. Notably, the modified test now checks for the new variable but does not check for any of the other previously-existing relevant environment variables that were not already tested for (e.g. `NCCL_BLOCKING_WAIT`). The change was tested via the following on an AI AWS cluster: `WORLD_SIZE=2 BACKEND=nccl gpurun pytest test/distributed/test_distributed_spawn.py -k test_dump_DDP_relevant_env_vars -vs` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59109 Reviewed By: H-Huang, SciPioneer Differential Revision: D28761148 Pulled By: andwgu fbshipit-source-id: 7be4820e61a670b001408d0dd273f65029b1d2fe	2021-06-01 14:02:41 -07:00
Thomas J. Fan	5f1117226f	DOC Update register_buffer/parameter docstring explaining None (#59015 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59015 Reviewed By: ngimel Differential Revision: D28797948 Pulled By: jbschlosser fbshipit-source-id: 3bf60af5c1cfc5f1786b4975b48f093391374503	2021-06-01 13:55:07 -07:00
Jerry Zhang	e4b2684331	[quant][graphmode][fx][refactor] Remove patterns from Quantizer class (#59033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724861 fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a	2021-06-01 13:44:08 -07:00
Jerry Zhang	83892c1861	[quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724868 fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b	2021-06-01 13:26:09 -07:00
Jerry Zhang	3826f7e8e0	[quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724871 fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4	2021-06-01 13:01:54 -07:00
Jerry Zhang	1b4586ee20	[quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724875 fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c	2021-06-01 12:42:28 -07:00
Elton Leander Pinto	aa857850bb	Add check_env, getenv api (#59052 ) Summary: Related Issue: https://github.com/pytorch/pytorch/issues/57691 This PR introduces an API for checking environment variables: ```c++ optional<bool> check_env(const char name) ``` Reads the environment variable name and returns - `optional<true>`, if set equal to "1" - `optional<false>`, if set equal to "0" - `nullopt`, otherwise Issues a warning if the environment variable was set to any value other than 0 or 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59052 Test Plan: Manually run the following test case: - Apply this diff to the repo ``` diff --git a/torch/csrc/Exceptions.cpp b/torch/csrc/Exceptions.cpp index d008643f70..990d254f0d 100644 --- a/torch/csrc/Exceptions.cpp +++ b/torch/csrc/Exceptions.cpp @@ -9,6 +9,9 @@ #include <torch/csrc/THP.h> +#include <c10/util/Optional.h> +#include <c10/util/env.h> + // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) PyObject THPException_FatalError; @@ -23,18 +26,7 @@ bool THPException_init(PyObject *module) namespace torch { static bool compute_cpp_stack_traces_enabled() { - auto envar = std::getenv("TORCH_SHOW_CPP_STACKTRACES"); - if (envar) { - if (strcmp(envar, "0") == 0) { - return false; - } - if (strcmp(envar, "1") == 0) { - return true; - } - TORCH_WARN("ignoring invalid value for TORCH_SHOW_CPP_STACKTRACES: ", envar, - " valid values are 0 or 1."); - } - return false; + return c10::utils::check_env("TORCH_SHOW_CPP_STACKTRACES").value_or(false); } bool get_cpp_stacktraces_enabled() { ``` This patch replaces the prior `std::getenv` usage in `torch/csrc/Exceptions.cpp` to use the new api. - Run the following python3 script ```python import torch print(torch.__version__) # should print local version (not release) a1 = torch.tensor([1,2,3]) a2 = torch.tensor([2]) a1 @ a2 ``` using the following commands ```bash python3 test.py # should not output CPP trace TORCH_SHOW_CPP_STACKTRACES=1 python3 test.py # should output CPP trace ``` Reviewed By: ngimel Differential Revision: D28799873 Pulled By: 1ntEgr8 fbshipit-source-id: 3e23353f48679ba8ce0364c049420ba4ff86ff09	2021-06-01 12:24:14 -07:00
aflah02	fd2a36369a	Fixed torch.nn.MultiMarginLoss equation format error (#59188 ) Summary: Removed the extra parenthesis from the right side Fixes https://github.com/pytorch/pytorch/issues/58634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59188 Reviewed By: ngimel Differential Revision: D28797720 Pulled By: jbschlosser fbshipit-source-id: 47e3084526389e7d1cc17c1a01b253e666c58784	2021-06-01 12:04:34 -07:00
Jack Montgomery	06399d441d	Create EngineHolder for serializing and running TRT Engines with PyTorch Test Plan: python tests `buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test` python tests to generate test models (this outputs the jit model files for use with cpp tests) `buck run mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_generate_test_models` cpp tests `buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test_cpp` run service locally build service `buck build mode/opt-split-dwarf -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 smart/inference_platform_sp/predictor_gpu:service` run service `buck-out/gen/smart/inference_platform_sp/predictor_gpu/service --model_dir="/home/jackmontgomery" --model_id=123_0 --pytorch_predictor_use_cuda` build requester `buck build mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 glow/fb/test:invoke_cv_pt_predictor` run requester `buck-out/gen/glow/fb/test/invoke_cv_pt_predictor.par --model_id=123_0 --port=33131 --host="2401:db00:eef0:1100:3560:0:1c02:2115" --num_parallel_requesters=1` Reviewed By: 842974287 Differential Revision: D28581591 fbshipit-source-id: 7738b05543c2c840ee6b8f0d4818f21dc7f61b19	2021-06-01 11:41:33 -07:00
albanD	e9e5588588	Improve Tensor traverse to traverse its grad_fn when possible (#58271 ) Summary: There are two main changes here: - THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear. - THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources. The two tests are to ensure: - That the cycles are indeed collectible by the gc Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271 Reviewed By: ngimel Differential Revision: D28796461 Pulled By: albanD fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d	2021-06-01 10:27:52 -07:00
Your Name	65748f81c9	Un-verbose the build (#59235 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59235 Reviewed By: zou3519 Differential Revision: D28792468 Pulled By: driazati fbshipit-source-id: 98f730ea0ee28b4b5c13198879bee8f586c0c14c	2021-06-01 10:14:26 -07:00
Jerry Zhang	7523728368	[quant][graphmode][fx] Factor out run_weight_observer (#59029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59029 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724864 fbshipit-source-id: 67ac5e7eb351970fdf46532c3c2ac6ac831bc697	2021-06-01 10:01:42 -07:00
Jerry Zhang	10fc42eacc	[quant][graphmode][fx] Merge quant_env and env (#59028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028 Previously we have an env and a quant_env in convert, which is a bit confusing, in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]] Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724863 fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e	2021-06-01 09:21:38 -07:00
Luca Wehrstedt	afdfd2288a	Revert D28767060: [pytorch][PR] Migrate renorm to ATen (CPU and CUDA) Test Plan: revert-hammer Differential Revision: D28767060 (`74ec50893d`) Original commit changeset: 93dcbe5483f7 fbshipit-source-id: ae85d90212df4e6bb3a5da310e97ad1c06aa9a77	2021-06-01 05:15:21 -07:00
Daniel Haziza	0b040e17e5	More user-friendly error messages (#59106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59106 Should make debugging a bit easier Test Plan: Example error in https://www.internalfb.com/intern/aibench/details/884106485190261 (open log for Portal or Portal+): ``` The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__/torch/backends/_nnapi/prepare.py", line 29, in forward _0 = uninitialized(__torch__.torch.classes._nnapi.Compilation) if torch.__is__(self.comp, None): _1 = (self).init(args, ) ~~~~~~~~~~ <--- HERE else: pass File "code/__torch__/torch/backends/_nnapi/prepare.py", line 97, in init comp = __torch__.torch.classes._nnapi.Compilation.__new__(__torch__.torch.classes._nnapi.Compilation) _22 = (comp).__init__() _23 = (comp).init(self.ser_model, self.weights, ) ~~~~~~~~~~ <--- HERE self.comp = comp return None Traceback of TorchScript, original code (most recent call last): File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 47, in forward def forward(self, args: List[torch.Tensor]) -> List[torch.Tensor]: if self.comp is None: self.init(args) ~~~~~~~~~ <--- HERE comp = self.comp assert comp is not None File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 42, in init self.weights = [w.contiguous() for w in self.weights] comp = torch.classes._nnapi.Compilation() comp.init(self.ser_model, self.weights) ~~~~~~~~~ <--- HERE self.comp = comp RuntimeError: [enforce fail at nnapi_model_loader.cpp:171] result == ANEURALNETWORKS_NO_ERROR. NNAPI returned error: 4 ``` Reviewed By: axitkhurana Differential Revision: D28287450 fbshipit-source-id: ccd10301e1492f8879f9d6dd57b60c4e683ebb9e	2021-06-01 02:05:24 -07:00
Oleg Khabinov	cab4849463	[caffe2][glow] Share info about current batch_size (#58902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58902 Pull Request resolved: https://github.com/pytorch/glow/pull/5681 Reviewed By: ChunliF Differential Revision: D28665162 fbshipit-source-id: 39e173a24ee247bc6fee44009798c74dddb27648	2021-06-01 01:21:42 -07:00
Facebook Community Bot	7fb3385f4b	Automated submodule update: FBGEMM (#59170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59170 This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ffc2e1a91e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58874 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: hx89 Differential Revision: D28648577 Pulled By: jspark1105 fbshipit-source-id: 0ad1a6fdf27cd3f05f9e342030461cb7caa9986b	2021-05-31 23:18:58 -07:00
Peter Bell	74ec50893d	Migrate renorm to ATen (CPU and CUDA) (#59108 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616, closes https://github.com/pytorch/pytorch/issues/50874 This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot. #### Benchmarks (CPU): \| Shape \| Dim \| Before \| After (1 thread) \| After (8 threads) \| \|:------------:\|:---:\|--------:\|-----------------:\|------------------:\| \| (10, 10, 10) \| 0 \| 11.6 us \| 4.2 us \| 4.2 us \| \| \| 1 \| 14.3 us \| 5.2 us \| 5.2 us \| \| \| 2 \| 12.7 us \| 4.6 us \| 4.6 us \| \| (50, 50, 50) \| 0 \| 330 us \| 120 us \| 24.4 us \| \| \| 1 \| 350 us \| 135 us \| 28.2 us \| \| \| 2 \| 417 us \| 130 us \| 24.4 us \| #### Benchmarks (CUDA) \| Shape \| Dim \| Before \| After \| \|:------------:\|:---:\|--------:\|--------:\| \| (10, 10, 10) \| 0 \| 12.5 us \| 12.1 us \| \| \| 1 \| 13.1 us \| 12.2 us \| \| \| 2 \| 13.1 us \| 11.8 us \| \| (50, 50, 50) \| 0 \| 33.7 us \| 11.6 us \| \| \| 1 \| 36.5 us \| 15.8 us \| \| \| 2 \| 41.1 us \| 15 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59108 Reviewed By: mrshenli Differential Revision: D28767060 Pulled By: ngimel fbshipit-source-id: 93dcbe5483f71cc6a6444fbd5b1aa1f29975d857	2021-05-31 22:38:16 -07:00
kshitij12345	223725cfb0	OpInfo: div - port pending method_tests entry (#59173 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Depends on: https://github.com/pytorch/pytorch/issues/59154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59173 Reviewed By: ngimel Differential Revision: D28785178 Pulled By: mruberry fbshipit-source-id: 902310f2d77e499a2355a23b2d5a8c0b21b8c5bb	2021-05-31 17:32:27 -07:00
Kushashwa Ravi Shrimali	6d45d7a6c3	Enables previously "slow" `gradgrad` checks on CUDA (#57802 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57508 Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see https://github.com/pytorch/pytorch/issues/57508 for reference) and this PR aimed on: 1. Time taken by `gradgrad` checks on CUDA for the ops listed below. 2. Enabling the tests again if the times sound reasonable Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`. For numbers (on time taken) on a separate CI run: https://github.com/pytorch/pytorch/pull/57802#issuecomment-836169691. cc: mruberry albanD pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/57802 Reviewed By: ngimel Differential Revision: D28784106 Pulled By: mruberry fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8	2021-05-30 22:16:46 -07:00
krshrimali	ef40757de3	OpInfo: `zero_` (#58731 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58731 Reviewed By: ngimel Differential Revision: D28784083 Pulled By: mruberry fbshipit-source-id: f06de8045afd3728b1fedc014c091d8fd1955a9f	2021-05-30 21:49:29 -07:00
kshitij12345	2aeb16c13a	[fix] i1-i1e ROCm failure: mark array as const so that it is available for host and device (#59187 ) Summary: Fix failing ROCm build introduced by https://github.com/pytorch/pytorch/issues/56352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59187 Reviewed By: ngimel Differential Revision: D28784072 Pulled By: mruberry fbshipit-source-id: 36a5bd11ad2fe80a81aae6eb8b21f0901c842ddc	2021-05-30 21:44:54 -07:00
kshitij12345	fea7a79e0b	[special] Add ndtr (#58126 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Plot: ![image](https://user-images.githubusercontent.com/19503980/117942099-54efd680-b328-11eb-8948-c3080779ce19.png) https://colab.research.google.com/drive/1Of67A042rOImj8wrLF_fUTgoy_wVEOZS?usp=sharing TODO: * [x] Add docs (https://13385714-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtr) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58126 Reviewed By: anjali411 Differential Revision: D28700957 Pulled By: mruberry fbshipit-source-id: 5b9991e97ec1e8fd01518cc9d9849108d35fe406	2021-05-30 21:12:04 -07:00
Peter Bell	2a78f6376c	TensorIterator: Reduce serial_for_each static overhead (#58909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58909 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28776507 Pulled By: ngimel fbshipit-source-id: 4f0283d03b26aa5785b687b78d77e6b0efcbaf65	2021-05-30 21:08:54 -07:00
kshitij12345	445e838210	OpInfo: resize_, resize_as_ (#59176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59176 Reviewed By: ngimel Differential Revision: D28780083 Pulled By: mruberry fbshipit-source-id: 472584e8faa4cb1031908df097849d2d4167fdf5	2021-05-30 18:53:17 -07:00
kshitij12345	ea465f7378	OpInfo: true_divide and minor fix (#59154 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154 Reviewed By: ngimel Differential Revision: D28780115 Pulled By: mruberry fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304	2021-05-30 18:35:30 -07:00
Peter Bell	aaccdc3996	SparseCsr: Fix some uses of deprecated Tensor methods (#58990 ) Summary: This fixes some deprecation warnings in the build that were introduced by https://github.com/pytorch/pytorch/issues/58768. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58990 Reviewed By: ngimel Differential Revision: D28776804 Pulled By: mruberry fbshipit-source-id: 8abf75ea8f7adca537f9c808e68356829407665e	2021-05-30 03:58:19 -07:00
Kshiteej K	6ee9466d3a	OpInfo: tensor_split: port remaining method_test entries (#59133 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59133 Reviewed By: ngimel Differential Revision: D28776470 Pulled By: mruberry fbshipit-source-id: 975a7062788de514f214f8c4ef0146eaf6b407f7	2021-05-30 00:40:29 -07:00
Peter Bell	96c549d1c6	Replace `dim_apply` with `TensorIterator` (#58656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58656 Ref gh-56794 `dim_apply` is problematic because it calls `Tensor.select` inside of a parallel region. Instead, replace it with `TensorIterator` by squashing the apply-dimension. This is similar to the `_dim_apply` function already used by the sort kernels: `8c91acc161/aten/src/ATen/native/cpu/SortingKernel.cpp (L27)` Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28776441 Pulled By: ngimel fbshipit-source-id: 14449d4b12ed4576f879bb65a35e881ce1a953b1	2021-05-30 00:09:14 -07:00
kshitij12345	cab65ea3b9	OpInfo: renorm (#59079 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59079 Reviewed By: ngimel Differential Revision: D28776789 Pulled By: mruberry fbshipit-source-id: ca46f2debe918c3de1f3b5bbc9924b7ddfe9442a	2021-05-29 22:38:15 -07:00
kshitij12345	5c18994674	[special] Add `i1` and `i1e` (#56352 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * [x] Check Docs https://12721710-65600975-gh.circle-artifacts.com/0/docs/special.html * [x] Investigate fp32 failure on CI?! (Fails on clang. Reproduced locally with clang-11) * [ ] Kernel vs Composite? * [x] Autograd for `i0e` for zero? Pull Request resolved: https://github.com/pytorch/pytorch/pull/56352 Reviewed By: anjali411 Differential Revision: D28700888 Pulled By: mruberry fbshipit-source-id: 91a3cbb94f5b8a3b063589ec38179848c11def83	2021-05-29 20:55:23 -07:00
Mikhail Zolotukhin	27009d6129	[TensorExpr] Add NNC lowerings for `aten::view`, `aten::reshape` and `aten::expand_as`. (#59157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59157 Currently view is represented as a copy since we don't support inplace operations in NNC (similar to `aten::reshape`). Lowering for `aten::expand_as` is exactly the same as for the `aten::expand`, since we're building the TE expression basing on the output shape anyway. Differential Revision: D28774224 D28774224 Test Plan: Imported from OSS Reviewed By: Chillee Pulled By: ZolotukhinM fbshipit-source-id: 0a1593c4c6500dcc5a374213adb734180ae1f72e	2021-05-29 20:36:32 -07:00
Natalia Gimelshein	355b24438c	make vector_norm backward call norm_backward (#59135 ) Summary: Per title. Remove duplicated code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59135 Reviewed By: mruberry Differential Revision: D28775716 Pulled By: ngimel fbshipit-source-id: 50dc77590db15976453fc41c3657a77198749849	2021-05-29 12:14:46 -07:00
kshitij12345	9fc0c5a54a	OpInfo: tril, triu (#59145 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59145 Reviewed By: ngimel Differential Revision: D28776433 Pulled By: mruberry fbshipit-source-id: 2ff11a5202af1e73ffc2b242035c990646bd2259	2021-05-29 02:55:50 -07:00
Natalia Gimelshein	1871d4e604	avoid explicitly casting low precision inputs to fp32 in norm (#59134 ) Summary: Per title. Now `norm` with fp16/bfloat16 inputs and fp32 outputs on cuda won't do explicit cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/59134 Reviewed By: mruberry Differential Revision: D28775729 Pulled By: ngimel fbshipit-source-id: 896daa4f02e8a817cb7cb99ae8a93c02fa8dd5e9	2021-05-29 00:48:18 -07:00
kshitij12345	d68df54269	OpInfo: fill_ (#59138 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59138 Reviewed By: ngimel Differential Revision: D28776451 Pulled By: mruberry fbshipit-source-id: 2e8e9f1805ec7d900223ea749a4a0b86a1bedb54	2021-05-29 00:35:02 -07:00
Horace He	a427820350	[NNC] Added triangular_solve external call + fixed permute (#59131 ) Summary: The triangular_solve only returns the first input, since the second input is just a copy of the first one. Why does that exist? Also, I fixed the permute lowering - I was previously doing the inverse application of the permute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59131 Reviewed By: ansley Differential Revision: D28768169 Pulled By: Chillee fbshipit-source-id: 8e78611c6145fb2257cb409ba98c14ac55cdbccf	2021-05-28 22:29:30 -07:00
kshitij12345	c9af4c2636	OpInfo: where (#58349 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58349 Reviewed By: mrshenli Differential Revision: D28744220 Pulled By: mruberry fbshipit-source-id: 893a2fb88a48a60df75c7d6e2f58a42ca949daa7	2021-05-28 18:22:03 -07:00
Michael Suo	b977a3b66d	[c10d] Split custom class bindings out of python binding code (#58992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. ghstack-source-id: 130168942 Test Plan: CI Reviewed By: wconstab Differential Revision: D28690832 fbshipit-source-id: 3c5e3fff28abb8bcdb4a952794c07de1ee2ae5a8	2021-05-28 15:35:23 -07:00
Tao Xu	ab372ba510	[iOS GPU] Add debug information to track memory allocation exception (#59112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59112 ghstack-source-id: 130027273 Test Plan: CI Reviewed By: linbinyu Differential Revision: D28730604 fbshipit-source-id: 2ec7ca1b722a9fe496635cb6eea7e0d88b0c18b1	2021-05-28 12:16:29 -07:00
Alexander	41054f2ab5	CUDA support in the CSR layout: sparse_to_dense/add_sparse_csr (#59011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59011 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28719550 Pulled By: bhosmer fbshipit-source-id: 530c7cd1b20ae6d8865fd414afaf6fab27a643e6	2021-05-27 20:59:22 -07:00
David Reiss	9c83e4160d	Use some c10::ThreadLocal to avoid crashes on old Android toolchains (#59017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59017 See the comment in ThreadLocal.h for context. I used a slightly dirty preprocessor hack to minimize the number of changes. The hope is that we'll be able to revert all of these soon. Test Plan: CI. Built FB4A with gnustl and saw no references to cxa_thread_atexit in the PyTorch libraries. Reviewed By: ilia-cher Differential Revision: D28720762 fbshipit-source-id: 0f13c7ac5a108b95f8fde6dbc63c6b8bdb8599de	2021-05-27 20:49:03 -07:00
David Reiss	4b3d17c0a2	Include Macros.h in ThreadLocal Summary: This wasn't picking up C10_ANDROID. Not sure how to prevent stuff like this. Test Plan: Build for Android+gnustl, saw proper ThreadLocal being defined. Reviewed By: swolchok Differential Revision: D28720763 fbshipit-source-id: 58eb4ea80ad32a856fcea6d65e5c1c37ebf3bd55	2021-05-27 20:47:56 -07:00
Kushashwa Ravi Shrimali	0c1420aa3c	OpInfo: `fmod` and `remainder` (#57941 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 cc: mruberry Lezcano kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941 Reviewed By: mrshenli Differential Revision: D28744464 Pulled By: mruberry fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c	2021-05-27 20:32:56 -07:00
Natalia Gimelshein	657b75d155	Revert D28700259: [pytorch][PR] Migrate nonzero from TH to ATen (CPU) Test Plan: revert-hammer Differential Revision: D28700259 (`95b1bc1009`) Original commit changeset: 9b279ca7c36d fbshipit-source-id: 267afe63376be598d24c862e02e3b4b3ea75f77c	2021-05-27 20:07:30 -07:00
Eddie Yan	4e543d017f	Move remaining \Sort\ in `THC` to `ATen` (#58953 ) Summary: https://github.com/pytorch/pytorch/issues/24637 CC zasdfgbnm ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/58953 Reviewed By: mrshenli Differential Revision: D28749713 Pulled By: ngimel fbshipit-source-id: 33ce87cf77e23d5d67d193d6368131cb8dab39ae	2021-05-27 18:35:42 -07:00
eellison	f3aa61b9ed	Add peephole for len(x.size()) (#59051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59051 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D28727247 Pulled By: eellison fbshipit-source-id: 6474d39773b640992bdaf261575a3dbd48c6d56c	2021-05-27 17:57:53 -07:00
eellison	b9dc51863c	Add more shape functions for mobilenet (#58932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58932 This adds all the operators necessary for mobilenet. I kind of wanted to get these landed to unblock ZolotukhinM, but I'm happy to split these up into multiple PRs if it makes reviewing easier. In terms of testing, i'm going to add an automated shape analysis OpInfo test. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28727246 Pulled By: eellison fbshipit-source-id: c17f9b7bdf7a43ddf99212b281ae2dd311259374	2021-05-27 17:57:52 -07:00
eellison	0ebc665305	Switch symbolic shape registry to operator map' (#58890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58890 ' Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28663681 Pulled By: eellison fbshipit-source-id: 5b44fdf14a8ffe05606cc12897e366a64259650d	2021-05-27 17:57:50 -07:00
eellison	d8cbba3ee2	[JIT] Disable Complete Shape Inlining For Testing Purposes (#56966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966 This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28444664 Pulled By: eellison fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10	2021-05-27 17:57:48 -07:00
eellison	f66fbb1e2e	Add unary/binary ops necessary for mobilenet (#56828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56828 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28444660 Pulled By: eellison fbshipit-source-id: 656673e6139550f2752c0d3ac2fb8731f4bf9bbb	2021-05-27 17:56:30 -07:00
Pritam Damania	40f851c53e	Use dataclasses to simplify ShardingSpec (#58893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58893 Leverage dataclasses to simplify some of the ShardingSpec classes. ghstack-source-id: 130041687 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28665137 fbshipit-source-id: da37517cf2bd8c65d4a5b7cae171fa460e6b0946	2021-05-27 17:33:28 -07:00
Supriya Rao	89d78851e6	[quant][refactor tests] Move qtensor serialization tests from test_deprecated_jit (#59089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59089 Move these tests into test_quantized_tensor Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28750065 fbshipit-source-id: 5c4350d49dd07710b86ba330de80369403c6013c	2021-05-27 17:04:15 -07:00
Supriya Rao	886a2ddc83	[quant][refactor tests] Clean up test_quantization.py (#59088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59088 Clean up comments and organize the tests better Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28750064 fbshipit-source-id: 4c36922e25e3adea3aaa8b4d9185dc28b17aa57c	2021-05-27 17:03:01 -07:00
Peter Bell	f993ceffb5	TensorIteratorReduce: Avoid tensor operations in parallel_for (#58655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58655 Ref gh-56794 The two pass reduction calls `copy_` and `select` inside a parallel region. The `copy_` can just be moved outside of the parallel region, but avoiding the `select` call is more complicated because it's needed to construct the `TensorIterator`. Instead, I factor out a `serial_for_each` free-function that just takes pointers and strides. Then manually advance the pointer to the thread-specific slice of data. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28735330 Pulled By: ngimel fbshipit-source-id: 8e096eb5801af9381ebd305e3ae7796a79b86298	2021-05-27 15:58:03 -07:00
Joel Schlosser	ef32a29c97	Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080 Original commit changeset: 3686579517cc Test Plan: None; reverting diff Reviewed By: albanD Differential Revision: D28746799 fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c	2021-05-27 15:40:52 -07:00
Sureyya Emre Kurt	3d2b55553b	Retiring _module_copies field in DDP reducer. (#59094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59094 Commented out _module_copies fields and changed for loops accordingly Test Plan: Test cases mentioned in T91292908 passed succesfully Reviewed By: SciPioneer Differential Revision: D28736135 fbshipit-source-id: 1857102f0c57a734026f3025e9653d8fad57d0b6	2021-05-27 15:09:14 -07:00
Mustafa Bal	c6c563fc26	Added minor fixes to Az DevOps Build Logic (#59016 ) Summary: This PR also adds a a few minor logic changes to the custom PyTorch PR tests logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59016 Reviewed By: mrshenli Differential Revision: D28732437 Pulled By: malfet fbshipit-source-id: 14b7ed837209d77e0e175d92959aeb0f086e6737	2021-05-27 14:35:11 -07:00
Natalia Gimelshein	61f946bba6	don't copy indices to the self device in dispatch_index (#59059 ) Summary: Let index/index_put implementation in aten take care of moving the indices to the correct device, don't make python wrapper do that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59059 Reviewed By: mruberry Differential Revision: D28750562 Pulled By: ngimel fbshipit-source-id: 2f2b5f875733898f1c0b30b544c89808f91e4a6f	2021-05-27 14:19:59 -07:00
Sam Estep	16ae6cad3d	Revert D28615349: [pytorch][PR] Add a ci/no-build label Test Plan: revert-hammer Differential Revision: D28615349 (`bae06a0293`) Original commit changeset: 1ed521761ca4 fbshipit-source-id: 987439c2570bbffc0f0f8517d82970a3a4add789	2021-05-27 14:17:06 -07:00
Sam Estep	bae06a0293	Add a ci/no-build label (#58778 ) Summary: Depends on https://github.com/pytorch/pytorch-probot/pull/22. Adds a new label called `ci/no-build` that disables the CircleCI `build` workflow on PRs. The current behavior should be the same in the absence of `ci/no-build`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58778 Reviewed By: malfet Differential Revision: D28615349 Pulled By: samestep fbshipit-source-id: 1ed521761ca4ffa32db954a51918f693beddb3f3	2021-05-27 14:03:03 -07:00
Gary Miguel	3e2db56dcf	[docs] document dim argument to tensor.size() (#58777 ) Summary: [docs] document dim argument to tensor.size() Pull Request resolved: https://github.com/pytorch/pytorch/pull/58777 Reviewed By: gchanan Differential Revision: D28641109 Pulled By: zou3519 fbshipit-source-id: 5cb46bb8abe45ed299843af38515e5db89ad02a1	2021-05-27 13:51:56 -07:00
Your Name	18302bcdf3	Add script to cancel workflows (#59019 ) Summary: This removes our cancel_redundant_workflows job in favor of GitHub's built in [`concurrency`](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#concurrency) keyword which limits runs of a particularly named group. Since the group names have to be unique per job per PR, it should end up looking something like `filename-job_name-{pr number \| sha (for non-PR workflows)}`. There's also a script to check workflows and ensure that it is being properly gated so people don't forget to add the key in the future. `ruamel.YAML` also didn't like some of the spacing so that is changed but it also makes it more consistent so � This also has a minor change of renaming the workflow templates from `.in` to `.j2` which is the standard Jinja2 extension that the VSCode extension automatically picks up for syntax highlighting / errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59019 Test Plan: pushed a commit `reset` and then immediately another commit `test`: the jobs from `reset` are cancelled: https://github.com/pytorch/pytorch/actions/runs/880099510 Reviewed By: samestep Differential Revision: D28722419 Pulled By: driazati fbshipit-source-id: c547a161877a0583be9d7edb29244b086b6bcad1	2021-05-27 12:32:15 -07:00
Scott Wolchok	920619dc2b	[PyTorch] Save a refcount bump in meta functions for addmm and mm (#59063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59063 `TensorMeta::maybe_get_output()` returns `const Tensor&`, no need to copy the Tensor.. ghstack-source-id: 130044287 Test Plan: CI Reviewed By: ngimel Differential Revision: D28735225 fbshipit-source-id: f2bdf39b28de245ec4664718490e7e0b36bc8819	2021-05-27 12:15:52 -07:00
BowenBao	2c17b6a0fe	[ONNX] Enable support for roll() op. (#58389 ) (#58697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58697 1. Add a symbolic function for aten::roll() op in symbolic_opset9.py. 2. Add a test with multiple scenarios as well. Test Plan: Imported from OSS Reviewed By: driazati, bhosmer Differential Revision: D28714807 Pulled By: SplitInfinity fbshipit-source-id: eae85f2dcf02737c9256a180f6905a935ca3f57e Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-05-27 12:06:45 -07:00
BowenBao	1aabb8f98c	[ONNX] handle aten::_set_item on Dict in convertInplaceOpsAndTrackAlias (#58317 ) (#58696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58696 It seems the JIT produces an output for aten::_set_item on lists but not on dicts. Previously the code would crash because it assumed it was operating on a list. The different behavior can be seen with the following test: ```python class DictModule(torch.nn.Module): def forward(self, x_in: torch.Tensor) -> typing.Dict[str, torch.Tensor]: x_out = {} x_out["test_key_out"] = x_in return x_out x_in = torch.tensor(1) dms = torch.jit.script(DictModule()) torch.onnx.export(dms, (x_in,), "/dev/null", example_outputs=(dms(x_in),)) ``` Before this change: `RuntimeError: outputs_.size() == 1INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/ir.h":452, please report a bug to PyTorch.` After this change: `RuntimeError: Exporting the operator prim_DictConstruct to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.` This is a more useful error message. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714804 Pulled By: SplitInfinity fbshipit-source-id: 1e5dc5fb44d1e3f971a22a79b5cf009d7590bf84 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-05-27 12:06:44 -07:00
BowenBao	0a6828a306	[ONNX] use consistent quoting for string literals (#57757 ) (#58695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58695 As PEP8 says: "Pick a rule and stick to it." [1] [1] https://www.python.org/dev/peps/pep-0008/#string-quotes Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714811 Pulled By: SplitInfinity fbshipit-source-id: c95103aceb1725c17c034dc6fc8216627f189548 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-05-27 12:06:42 -07:00
BowenBao	b27fc0ff85	[ONNX] Improve lower tuples and handle control flow (#57650 ) (#58694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58694 Improving the logic for finding tuple patterns within control flow. Also fixes: https://github.com/pytorch/pytorch/issues/56914 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714806 Pulled By: SplitInfinity fbshipit-source-id: 1552100cf9cc88e6f58df2e90758e8898ba0a9b3 Co-authored-by: neginraoof <neginmr@utexas.edu>	2021-05-27 12:06:40 -07:00
BowenBao	57c9355a0d	[ONNX] Update special post process for SequenceInsert after SequenceEmpty (#56965 ) (#58693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58693 `ONNX::SequenceEmpty` requires dtype to be provided, and is default to float. We updates previous dtype of created `ONNX::SequenceEmpty` node when dtype is later discovered to be other than float, through downstream `ONNX::SequenceInsert` node. This PR improves the algorithm to cover nested loop case. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714808 Pulled By: SplitInfinity fbshipit-source-id: e45ab3a12d0fec637733acbd3cd0438ff80d2cd4 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-27 12:06:39 -07:00
BowenBao	b8c96e6b08	Support symbolic for conv_tbc (#58359 ) (#58692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58692 This is a fix for exporting fairseq models, see: ```python model = torch.hub.load(github, 'conv.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt') model = torch.hub.load(github, 'conv.wmt17.en-de', tokenizer='moses', bpe='subword_nmt') ``` With this fix, and comment out model script one line `GradMultiply`, these two models can be exported successfully with perf met. The original PR https://github.com/pytorch/pytorch/pull/57708 has merging issue, use this one instead. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714809 Pulled By: SplitInfinity fbshipit-source-id: 71c2de6cec7ee05af68560996acf47d97af46fb2 Co-authored-by: David <jiafa@microsoft.com>	2021-05-27 12:06:37 -07:00
BowenBao	d101816fdd	[ONNX] RNN scripting (#57564 ) (#58691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58691 Note the first commit in this PR has its own pull request here since it seemed self-contained: https://github.com/pytorch/pytorch/pull/57082 * [ONNX] simplify batch_first logic in RNN tests * [ONNX] support GRU with packed input in scripting mode This required two changes: * Add as_tensor to symbolic_opset9.py * Change torch::jit::pushPackingPastRnn to recognize and properly replace another use of the batch_sizes output of prim::PackPadded. Previously the code assumed that the first use was as input to the RNN operator. However in some cases, it is also used to compute max_batch_size. For example in this code: https://github.com/pytorch/pytorch/blob/febff45/torch/nn/modules/rnn.py#L815-L815 With these changes the GRU tests now pass in scripting mode for opset version >= 11. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714805 Pulled By: SplitInfinity fbshipit-source-id: f19647a04533d9ec76399a8793b3f712ea0337d2 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-05-27 12:06:35 -07:00
BowenBao	51d14b6859	[ONNX] Update instance_norm2 symbolic to handle track_running_stats=True (#55051 ) (#58690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58690 Fixes [#53887](https://github.com/pytorch/pytorch/issues/53887) Not input calling running_mean and running_var when track_running_stats=True Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D28714812 Pulled By: SplitInfinity fbshipit-source-id: 3f2f2ec9a7eaf8a1432a552d751cbd5974b20195 Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-05-27 12:05:26 -07:00
Jeff Daily	ba694520e5	[ROCm] fix JIT codegen (#57400 ) Summary: Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT. - ROCM_VERSION macro must be available to both device and host compilation passes. - Unifies some of CUDA and HIP differences in the code generated. - NAN / POS_INFINITY / NEG_INFINITY - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated] - Differentiates bf16 codegen for HIP. - Optionally provides missing macros when using hiprtc precompiled header feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400 Reviewed By: ejguan Differential Revision: D28421065 Pulled By: malfet fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074	2021-05-27 11:45:07 -07:00
Bin Bao	7e4e648c2a	Enable NNC fusion for relu6 (#58773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58773 Test Plan: ``` python test/test_ops.py -k relu6 python test/test_jit_fuser_te.py ``` Reviewed By: bertmaher Differential Revision: D28721791 Pulled By: desertfire fbshipit-source-id: a94f711977afd080faae052f66eb8dded3cdc79e	2021-05-27 10:54:02 -07:00
Freey0	0106fe3934	avg_pool2d: port to structured (#58987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58987 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28717067 Pulled By: ezyang fbshipit-source-id: 984a8ae8bc05811b787fdac565566f359b55a3d6	2021-05-27 10:51:11 -07:00
Venkata Chintapalli	d935259171	Remove redundant code from LayerNorm Fake Op. (#59054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59054 to handle elementwise_affine Test Plan: GLOW_NNPI=1 buck run -c fbcode.platform=platform009 //caffe2/caffe2/contrib/fakelowp/test:test_layernorm_nnpi_fp16nnpi Reviewed By: hx89 Differential Revision: D28726804 fbshipit-source-id: b03485e98d490d4e9e1b178a8c50677b77e27596	2021-05-27 10:35:32 -07:00
Meghan Lele	b14c3205fd	[JIT] Add torch._C.ScriptDict (#52659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659 Summary This commit adds `torch._C.ScriptDict`, a dictionary type that has reference semantics across the Python/TorchScript boundary. That is, modifications made to instances of `torch._C.ScriptDict` in TorchScript are visible in Python even when it is not returned from the function. Instances can be constructed by passing an instance of a Python dictionary to `torch.jit.script`. In the case of an empty dictionary, its type is assumed to be `Dict[str, Tensor]` to be consistent with the handling of empty dictionaries in TorchScript source code. `torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code). Test Plan This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests that check that all of the common dictionary operations are supported and that instances have reference semantics across the Python/TorchScript boundary. Differential Revision: D27211605 D27211605 Test Plan: Imported from OSS Reviewed By: gmagogsfm Pulled By: SplitInfinity fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960	2021-05-27 10:25:30 -07:00
Peter Bell	95b1bc1009	Migrate nonzero from TH to ATen (CPU) (#58811 ) Summary: Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2220 us \| 496 us \| \| 128,128,32 \| 1250 us \| 976 us \| 175 us \| \| 64,128,32 \| 581 us \| 486 us \| 88 us \| \| 32,128,32 \| 292 us \| 245 us \| 80 us \| \| 16,128,32 \| 147 us \| 120 us \| 71 us \| \| 8,128,32 \| 75 us \| 61 us \| 61 us \| \| 4,128,32 \| 39 us \| 32 us \| 32 us \| \| 2,128,32 \| 20 us \| 17 us \| 17 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159	2021-05-27 10:06:54 -07:00
Akshit Khurana	934f6dca65	Fix pthreadpool guard test (#58977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58977 * Test was flaky as part of it ran async * Remove async part to test only the functionality added Test Plan: regular test: `buck test mode/dev //caffe2/aten:test_thread_pool_guard -- --exact 'caffe2/aten:test_thread_pool_guard - TestThreadPoolGuard.TestRunWithGuard' --run-disabled` stress test: `buck test mode/dev //caffe2/aten:test_thread_pool_guard -- --exact 'caffe2/aten:test_thread_pool_guard - TestThreadPoolGuard.TestRunWithGuard' --run-disabled --jobs 18 --stress-runs 10 --record-results` Reviewed By: kimishpatel Differential Revision: D28703064 fbshipit-source-id: be19da3f42f44288afc726bdb2f40342eee26e01	2021-05-27 09:49:52 -07:00
Shivansh Dhar	e89b150a39	[typing] Pyre fixes for remote_module (#59046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59046 Correcting type hint for _RemoteModule to pass Pyre checks. Test Plan: N/A Reviewed By: walterddr, SciPioneer Differential Revision: D28725237 fbshipit-source-id: 1ca714bbf1a597a29850f70bac826a0c95a4019f	2021-05-27 09:44:50 -07:00
Ilia Cherniavskii	11aa5e4f66	Add underscores to some internal names (#59027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59027 Add underscores to some of the internal names Test Plan: python test/test_profiler.py -v Imported from OSS Reviewed By: mrshenli Differential Revision: D28724294 fbshipit-source-id: 1f6252e4befdf1928ac103d0042cbbf40616f74a	2021-05-27 09:39:28 -07:00
Bert Maher	617b74aa35	[nnc] LLVMCodeGen for any target (#58713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58713 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28585722 Pulled By: bertmaher fbshipit-source-id: 82885b9780dc1a8610660a90969d8d2baad97920	2021-05-27 09:25:15 -07:00
Jing Shan	a1806134a7	[QAT] Fix the runtime run `cannot resize variables that require grad` (#57068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57068 When training with histogram observer on, we got this runtime error: ``` torch/quantization/observer.py", line 942, in forward self.bins) self.histogram.resize_(combined_histogram.shape) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE self.histogram.copy_(combined_histogram) self.min_val.resize_(combined_min.shape) RuntimeError: cannot resize variables that require grad ``` Since this is the histogram observer that is used to collect histogram information, should not need gradient. So turn off the grad before resizing using `detach_()` method. Test Plan: - arc lint - Train with histogram observer turned on, training finished successfully f264139727 Reviewed By: supriyar Differential Revision: D27147212 fbshipit-source-id: abed5b9c4570ffc6bb60e58e64791cfce66856cd	2021-05-27 09:12:06 -07:00
Jing Shan	25ac647f64	[QAT] Auto format the torch/quantization/observer.py` (#57067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57067 auto format the code Test Plan: lint Reviewed By: jerryzh168 Differential Revision: D27147213 fbshipit-source-id: 008871d276c8891b2411549e17617e5c27d16ee3	2021-05-27 09:10:34 -07:00
Rong Rong (AI Infra)	9baf75c86e	add test_filename field in scribe upload (#59024 ) Summary: Add test filename dimension to scribe upload Pull Request resolved: https://github.com/pytorch/pytorch/pull/59024 Test Plan: CI - validate result on scuba table Reviewed By: janeyx99 Differential Revision: D28726711 Pulled By: walterddr fbshipit-source-id: 78a1708787f0507d1171800f633e1f7137f629cd	2021-05-27 08:21:05 -07:00
Rong Rong (AI Infra)	7461792c4a	adding upload step on all build jobs (#58998 ) Summary: Relates to https://github.com/pytorch/pytorch/issues/58826. Currently we don't have the exact build time for non-binary jobs collected. collecting this reports the exact test time from pytorch checkout finish till build stage successful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58998 Test Plan: CI - validate result on scuba table Reviewed By: janeyx99 Differential Revision: D28747962 Pulled By: walterddr fbshipit-source-id: 715d91d597bc004977fdceaf245263c9c8aacc84	2021-05-27 08:17:01 -07:00
Basil Hosmer	3d70ab08ae	bump out repeat_interleave BC allow date (#59057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59057 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28732990 Pulled By: bhosmer fbshipit-source-id: 27a9fe9169b2da9405d2c3f235e7c015896fe7fc	2021-05-26 23:32:05 -07:00
Supriya Rao	74089a0d34	[quant][refactor tests] Move quantization tests into subfolders (#59007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59007 Create folders for each test category and move the tests. Will follow-up with a cleanup of test_quantization.py Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: HDCharles Differential Revision: D28718742 fbshipit-source-id: 4c2dbbf36db35d289df9708565b7e88e2381ff04	2021-05-26 23:02:12 -07:00
Supriya Rao	e146ed21fb	[quant][refactor tests] Move TestModelNumerics to a separate file (#59000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59000 These tests span both QAT and PTQ APIs so factor them out Test Plan: python test/test_quantization.py TestModelNumericsEager Imported from OSS Reviewed By: HDCharles Differential Revision: D28713910 fbshipit-source-id: b2ad27cf59abb7cc0c4e4da705f8c9220410f8ad	2021-05-26 23:02:11 -07:00
Supriya Rao	b6c5c5d90e	[quant][refactor tests] Rename test_numeric_suite and equalization tests (#58999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58999 Rename the test files to be more explicit that they are for eager mode Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: HDCharles Differential Revision: D28713909 fbshipit-source-id: b4ccd06c841fe96edf8c065a0bceae15fed260f9	2021-05-26 23:02:09 -07:00
Supriya Rao	82d587f434	[quant][refactor tests] split test_workflow_module into test_workflow_ops and test_workflow_module (#58963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58963 some tests are used to check the op level numerics of the fake quantize operations Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: HDCharles Differential Revision: D28696599 fbshipit-source-id: 98f9b0c993dd43050176125461ddd5288142989b	2021-05-26 23:01:08 -07:00
Nikita Shulga	0e9a295b41	Refactor GlooDeviceFactory::makeDeviceFor... (#58996 ) Summary: `makeDeviceForHostname` and `makeDeviceForInterface` are almost duplicate except for different default argument values Create generic `makeGlooDevice` anonymous function that takes both host name and interface name and call it from both makeDeviceFor[Hostname\|Interface] Also solve two other minor issues: - do not call `getenv("GLOO_DEVICE_TRANSPORT")` during library load time - Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58996 Reviewed By: pbelevich Differential Revision: D28713324 Pulled By: malfet fbshipit-source-id: cb33b438078d163e3ec6f047f2e5247b07d94f8d	2021-05-26 20:33:11 -07:00
Jeffrey Wan	9e60c7dee3	Add docstring for is_inference_mode_enabled (#59047 ) Summary: Fixes` #{issue number} Testing: ``` >>> import torch >>> torch.is_inference_mode_enabled.__doc__ '\nis_inference_mode_enabled(input) -> (bool)\n\nReturns True if inference mode is currently enabled.\n\nArgs:\n input (Tensor): the input tensor.\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59047 Reviewed By: ailzhang Differential Revision: D28726991 Pulled By: soulitzer fbshipit-source-id: c117c7d73e551a1b5f0e215f2aed528bf558ef7c	2021-05-26 19:27:33 -07:00
Xiang Gao	1bd22e28b3	BFloat16 support for torch.sort (#58196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58196 Reviewed By: anjali411 Differential Revision: D28721364 Pulled By: ngimel fbshipit-source-id: 0785f7100fb76d69da7a73022c7d2eb43c91fa6e	2021-05-26 16:49:03 -07:00
Natalia Gimelshein	f4a890d7c6	fix unique for discontiguous inputs (#59003 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59003 Reviewed By: mruberry Differential Revision: D28714534 Pulled By: ngimel fbshipit-source-id: d9bf82f54be5b5919e27281e49fad74e00d8b766	2021-05-26 16:43:19 -07:00
Alexander	b435a27fb7	CUDA support in the CSR layout: constructors (#59010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59010 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28719287 Pulled By: bhosmer fbshipit-source-id: fbb5784ccb5ce19dcca1f2f95c4ee16f9b7680c4	2021-05-26 16:39:43 -07:00
Shiyan Deng	7c17e1dd90	[fx2trt] Quantized uru on gpu (#58965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58965 Test Plan: ``` // This script is just for playing around buck run mode/opt -c python.package_style=inplace deeplearning/trt/fx2trt:fx2trt_quantized_test // To check accuracy buck run mode/opt -c python.package_style=inplace deeplearning/trt/fx2trt:uru_10x10_to_trt_eval.py ``` Reviewed By: mortzur Differential Revision: D28445702 fbshipit-source-id: 5357a02a78cb7f9cf772e7a91a08166ef90cc4f8	2021-05-26 16:00:34 -07:00
Basil Hosmer	58d1b3639b	fix nn.MHA scriptability (#58727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580	2021-05-26 15:29:49 -07:00
Scott Wolchok	ac67cda272	[PyTorch] Rename TI::add_borrowed_{in,out}put to TI::add_{in,out}put (#58608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58608 D28523254 (`705dd9ffac`) ensures us that this was save: we renamed away all the internal uses of add_input/add_output. (Also, practically everything I found internally could borrow, and the stuff that couldn't wouldn't compile because it is passed unnamed temporaries.) ghstack-source-id: 129882758 Test Plan: CI Reviewed By: ezyang Differential Revision: D28524585 fbshipit-source-id: 437235d5cc55c3737c928991a996b8f5e1c5beaa	2021-05-26 15:06:28 -07:00
Scott Wolchok	7db36c0792	[PyTorch] Add temporary guardrail to borrowing_ op variants on TensorIterator (#58607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58607 Don't let code that tries to pass temporaries to these variants compile. ghstack-source-id: 129882759 Test Plan: CI Reviewed By: ezyang Differential Revision: D28524227 fbshipit-source-id: e5ce80f048480c67645198eaa0e43532567d4adb	2021-05-26 15:06:27 -07:00
Scott Wolchok	bed0eb5428	[PyTorch] Add TI::add_owned_{in,out}put for clarity & use them (#58606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58606 Removes the pit of non-success around using the owning variants; gives us the option to make add_{in,out}put borrow in the future as a pit of success if we decide that's not bc-breaking. ghstack-source-id: 129882760 Test Plan: CI Reviewed By: ngimel Differential Revision: D28523976 fbshipit-source-id: ab5eb7bf5d672a0f8c4a50eb8a21c156d4189709	2021-05-26 15:05:08 -07:00
Joel Schlosser	4f390eb6b6	Document factory_kwargs in nn.Quantize + remove Attributes section (#59025 ) Summary: The `factory_kwargs` kwarg was previously undocumented in `nn.Quantize`. Further, the `Attributes` section of the docs was improperly filled in, resulting in bad formatting. This section doesn't apply since `nn.Quantize` doesn't have parameters, so it has been removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59025 Reviewed By: anjali411 Differential Revision: D28723889 Pulled By: jbschlosser fbshipit-source-id: ba86429f66d511ac35042ebd9c6cc3da7b6b5805	2021-05-26 14:40:48 -07:00
Joel Schlosser	a749e8edf5	Add UninitializedBuffer to nn docs (#59021 ) Summary: The `UninitializedBuffer` class was previously left out of `nn.rst`, so it was not included in the generated documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59021 Reviewed By: anjali411 Differential Revision: D28723044 Pulled By: jbschlosser fbshipit-source-id: 71e15b0c7fabaf57e8fbdf7fbd09ef2adbdb36ad	2021-05-26 14:36:05 -07:00
Scott Wolchok	de22657e1c	[PyTorch] Replace RecordFunction shouldRun callback with atomic bools (#56504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56504 Having callbacks registered but disabled via their `shouldRun` callback defeats the `shouldRunRecordFunction` optimization (no relation between the two things, despite the shared prefix on the names) that aims to skip `RecordFunction` construction. This diff attempts to safely rectify this issue: we drop support for `shouldRun` callbacks (this is bc-breaking; does anything use these externally? do I need to add the support back and just stop using it internally?), add support for enabling and disabling callbacks, and (for global callbacks) make doing so thread-safe. There is an interesting subtlety with `std::atomic` that came up: it is neither copyable nor movable, which precludes putting it into `std::vector`. I manually overrode this because the thread safety reasons it is neither copyable nor movable don't apply here; we already state that adding or removing callbacks (the operations that might copy/move an atomic) are not thread-safe and should be done at initialization time. ghstack-source-id: 129614296 Test Plan: Existing CI should cover correctness, right? Inspected perf report of a simple benchmark that runs nn.Linear in a loop on CUDA, where internally have Kineto initialized and thus had a shouldRun observer previously; we are no longer going through the dispatcher's slow RecordFunction path or spending measurable time constructing RecordFunction instances. Reviewed By: ilia-cher Differential Revision: D27834944 fbshipit-source-id: 93db1bc0a28b5372f7307490c908457e7853fa92	2021-05-26 14:31:33 -07:00
Hui Guo	ac07c6451e	[nnc] Use BufHandle in loopnest.cache_accesses python API (#59006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59006 Related github issue: https://github.com/pytorch/pytorch/issues/59002 Test Plan: Imported from OSS Tested in the github issue: https://github.com/pytorch/pytorch/issues/59002 Reviewed By: bertmaher Differential Revision: D28714829 Pulled By: huiguoo fbshipit-source-id: 5fd7d5426c5cdb5af30731f662b083d2bd611bc4	2021-05-26 13:58:55 -07:00
Nikolay Korovaiko	b93e7a7602	concurrency fixes (#58961 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58961 Reviewed By: anjali411 Differential Revision: D28700307 Pulled By: Krovatkin fbshipit-source-id: 1fed90c64e88aaf2824c48e006f66a9266d1e163	2021-05-26 13:53:44 -07:00
Rong Rong (AI Infra)	97c1179c9d	Revert D28549240: [typing] Pyre fixes for batch_distributed_inference Test Plan: revert-hammer Differential Revision: D28549240 (`671c224b0a`) Original commit changeset: dadfedf93aae fbshipit-source-id: 820fefccf2b4c6368defd762ce55245dd35505ca	2021-05-26 13:39:30 -07:00
Meghan Lele	0d5527de7a	Back out "Back out "[ONNX] Process const folding progressively when converts to ONNX (#54569 )"" (#58923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58923 Original commit changeset: c54597b2048e ghstack-source-id: 129842041 Test Plan: Sandcastle and OSS CI. Reviewed By: snisarg Differential Revision: D28432555 fbshipit-source-id: 2a9ec22cc004c7c6979f1cc8f3124b833cdc6634	2021-05-26 13:29:07 -07:00
Pritam Damania	b420ded66f	ShardedTensor framework for ChunkedShardingSpec (#58517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58517 Building upon the sharding specifications, this PR introduces the intial skeleton of ShardedTensor and allows building a ShardedTensor by specifying ChunkedShardingSpec. In follow up PRs, I'll add further support for GenericShardingSpec. ghstack-source-id: 129917841 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28526012 fbshipit-source-id: 8e62847b58957d284e40f57a644302c171289138	2021-05-26 13:24:23 -07:00
Shivansh Dhar	671c224b0a	[typing] Pyre fixes for batch_distributed_inference Summary: Pyre does not support dynamic imports, so we can leave the pyre-ignores for those. (https://fb.workplace.com/groups/pyreqa/permalink/3119812734775204/) Parameterized pyre-ignore are also necessary as explained by [this Q&A](https://www.internalfb.com/intern/qa/109058/pyre-says-undefined-attribute-16-module-parameteri) Test Plan: - `pyre -l .` - `pyre check` - `buck test //caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test` Reviewed By: vipannalla Differential Revision: D28549240 fbshipit-source-id: dadfedf93aae860fe6d0a112002bdfe743139b1e	2021-05-26 13:08:19 -07:00
Brian Hirsh	be47060af9	[remove xla from codegen] rename aten_xla_type.h -> DispatchKeyNativeFunctions.h (#58568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58568 I split out the file rename into a separate commit to make the diff easier. The template file name is `aten_xla_type.h` -> `{DispatchKey}NativeFunctions.h` Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711298 Pulled By: bdhirsh fbshipit-source-id: 2fa7d2abede560a2c577300f0b5a1f7de263d897	2021-05-26 12:53:19 -07:00
Brian Hirsh	86ce2950f6	remove xla-specific stuff from codegen (minus CPU fallback) (#58064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58064 Summary This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. Notable changes The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](https://github.com/pytorch/xla/pull/2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28711095 Pulled By: bdhirsh fbshipit-source-id: 90a48440f2e865a948184e2fb167ea240ada47bb	2021-05-26 12:52:13 -07:00
chengjun	511979df85	Define the SYCL device version __assert_fail when the NDEBUG defined. (#58906 ) Summary: ## Motivation The utils in namespace `c10` require the `__assert_fail` when the NDEBUG is defined in kernel code. The `__assert_fail` declaration in pytorch is not compatible to the SYCL‘s specification. This causes compile error when use these utils in SYCL kernels. ## Solution Add the `__assert_fail` declaration for SYCL kernels to pytorch when compiling the SYCL kernels with `c10` utils. ## Additional context `__assert_fail` in SYCL kernel `extern SYCL_EXTERNAL void __assert_fail(const char expr, const char file, unsigned int line, const char *func);` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58906 Reviewed By: anjali411 Differential Revision: D28700863 Pulled By: ezyang fbshipit-source-id: 81896d022b35ace8cd16474128649eabedfaf138	2021-05-26 12:48:37 -07:00
kshitij12345	2e2a75720b	[structured] remainder (#58732 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58732 Reviewed By: gchanan Differential Revision: D28666480 Pulled By: ezyang fbshipit-source-id: f247365f2e6b3cdf29f7cc242f179041b968e75a	2021-05-26 12:44:32 -07:00
Jane Xu	29487ac7ff	Add 11.3 binaries without conda (#58877 ) Summary: Tested specifically for cuda 11.3 in https://github.com/pytorch/pytorch/pull/57522. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58877 Reviewed By: walterddr Differential Revision: D28697703 Pulled By: janeyx99 fbshipit-source-id: 08ae7f7d023cb93e47a2e0a4f115cee9e8a6156a	2021-05-26 12:40:13 -07:00
Alice Ou	24508337f4	Revert D28643215: Adds an aten::_ops namespace with unambiguous function names Test Plan: revert-hammer Differential Revision: D28643215 (`28740869a1`) Original commit changeset: 7b2b8459f1b2 fbshipit-source-id: ea869bf4cfde7038087e990b2cff5a86f9e2a531	2021-05-26 12:35:34 -07:00
Natalia Gimelshein	12418a4f86	Back out "Revert D28664514: [pytorch][PR] various TensorIterator speed improvements" Summary: Original commit changeset: fcad039b7dc8 Test Plan: Existing tests Reviewed By: mruberry Differential Revision: D28720186 fbshipit-source-id: 14ac99ee2d7cafb86b20c979f8917beeefd616e1	2021-05-26 12:22:48 -07:00
Edward Yang	17fb651a3b	Make torch.Tensor(torch.tensor(1.0)) work (#58885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58885 Fixes #58884 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28687510 Pulled By: ezyang fbshipit-source-id: 81325f501cc3e83cbac02f7c44ded9d396356bb8	2021-05-26 11:33:05 -07:00
Bert Maher	e24362746a	[nnc] Concat input shapes must be known to fuse (#58974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58974 I don't know how we overlooked this for so long... ghstack-source-id: 129932134 Test Plan: Predictor test of model 184778294_0 using multiple request replay threads. It's not clear to me why multithreading matters, except that perhaps it makes it easier to get an unknown shape in the profile. Reviewed By: navahgar Differential Revision: D28702660 fbshipit-source-id: 565550b1d2e571d62d0c8b21150193f2a7ace334	2021-05-26 11:29:26 -07:00
Natalia Gimelshein	8398ebaa86	Revert D28664514: [pytorch][PR] various TensorIterator speed improvements Test Plan: revert-hammer Differential Revision: D28664514 (`8a28bbeeb9`) Original commit changeset: 2e03cf90b37a fbshipit-source-id: fcad039b7dc823fec8afa694ab74a7ac5011f8ab	2021-05-26 10:49:58 -07:00
Janet Yang	c06d2afa99	[caffe2] Add support for int32 lengths in BatchSparseToDense (#58062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58062 Make templated function to make sure BatchSparseToDense supports int32 lengths/indices Test Plan: ```buck test //caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test ``` Reviewed By: khabinov Differential Revision: D28271423 fbshipit-source-id: 41b88b7a3663616b533aaf4731ff35cdf6ec4c85	2021-05-26 10:33:32 -07:00
zhouzhuojie	444e195b6d	Use docker base for clang-lint in CI (#58964 ) Summary: This PR introduces a docker base to speed up the `clang-tidy`'s dependencies stage. Originally I was looking into using the native github action cache, but the dependencies are spread across many apt and pip installation places, thus consolidating with a docker image might work better. It shortens the deps installation time from 4min down to 1min by pulling from docker base image. Base image used: https://github.com/pytorch/test-infra/pull/15 ``` FROM nvidia/cuda:10.2-devel-ubuntu18.04 RUN apt-get update && apt-get upgrade -y RUN apt install -y software-properties-common wget RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key \| apt-key add - RUN apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-11 main" RUN apt-add-repository ppa:git-core/ppa RUN apt-get update && apt-get upgrade -y && apt-get install -y git python3-dev python3-pip build-essential cmake clang-tidy-11 RUN update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-11 1000 RUN pip3 install pyyaml typing_extensions dataclasses ``` Previous successful run of clang-tidy: https://github.com/pytorch/pytorch/runs/2671193875?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/58964 Reviewed By: samestep Differential Revision: D28712536 Pulled By: zhouzhuojie fbshipit-source-id: 0c48a605efe8574c104da6a0cad1a8b7853ba35e	2021-05-26 10:15:24 -07:00
Zachary Kneupper	b8d56572a1	Open json config file in context manager (#58077 ) Summary: * Open json config file safely using a context manager (using a with block). * This will make sure that the file closed even if an exception is raised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58077 Reviewed By: anjali411 Differential Revision: D28711177 Pulled By: H-Huang fbshipit-source-id: 597ba578311b1f1d6706e487872db4e784c78c3c	2021-05-26 08:58:40 -07:00
Thomas J. Fan	8130f2f67a	DOC Adds code comment for _ConvNd.reset_parameters (#58931 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55741 by adding a comment regarding the behavior of `kaiming_uniform_` The docstring is correct in this case. For example: ```python import math import matplotlib.pyplot as plt import torch import torch.nn as nn in_channels = 120 groups = 2 kernel = (3, 8) m = nn.Conv2d(in_channels=in_channels, groups=groups, out_channels=100, kernel_size=kernel) k = math.sqrt(groups / (in_channels * math.prod(kernel))) print(f"k: {k:0.6f}") print(f"min weight: {m.weight.min().item():0.6f}") print(f"max weight: {m.weight.max().item():0.6f}") ``` outputs: ``` k: 0.026352 min weight: -0.026352 max weight: 0.026352 ``` And when we plot the distribution, it is uniform with the correct bounds: ```python _ = plt.hist(m.weight.detach().numpy().ravel()) ``` ![Unknown](https://user-images.githubusercontent.com/5402633/119552979-21ba3800-bd69-11eb-8e10-e067c943abe3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58931 Reviewed By: anjali411 Differential Revision: D28689863 Pulled By: jbschlosser fbshipit-source-id: 98eebf265dfdaceed91f1991fc4b1592c0b3cf37	2021-05-26 08:39:40 -07:00
Supriya Rao	950e67fa43	[quant][refactor tests] Move test_qat_module into test_quantize_eager_qat (#58928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58928 Test Plan: python test/test_quantization.py TestConvBNQATModule Imported from OSS Reviewed By: raghuramank100 Differential Revision: D28683925 fbshipit-source-id: 59d240d521c8067a344c9bdf4bec94e82f52e76f	2021-05-26 07:49:59 -07:00
Supriya Rao	cc07825a21	[quant][refactor tests] Split test_quantize into test_quantize_eager_ptq, test_quantize_eager_qat and test_fusion (#58927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58927 Part of larger re-factor of quantization tests to make it clearer as to which test belongs where. proposed folder structure ``` test/quantization - bc/ - test_backward_compatibility.py - core/ - test_quantized_kernels.py - test_quantized_workflow_ops.py - test_quantized_tensor.py - test_workflow_module.py - eager/ - test_quantize_eager_ptq.py - test_quantize_eager_qat.py - test_fusion.py - equalization/ - test_equalize_eager.py - test_bias_correction_eager.py - fx/ - test_quantize_fx.py - jit/ - test_quantize_jit.py - test_fusion_passes.py - numeric_suite/ - test_numeric_suite_fx.py - test_numeric_suite_eager.py ``` Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: raghuramank100 Differential Revision: D28683926 fbshipit-source-id: f84a4271c77c418ce9751196241933ea8cc14913	2021-05-26 07:48:28 -07:00
Richard Zou	28740869a1	Adds an aten::_ops namespace with unambiguous function names (#58092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58092 Fixes #58044. This PR: - adds `ATEN_FN(op)` and `ATEN_FN2(op, overload)` macros that resolve to an non-overloaded function in aten::_ops that calls the desired operator (without default arguments). The motivation for this is two-fold: 1) Using aten operators with templates is hard if the operator is overloaded (e.g. add.Tensor and add.Scalar). 2) Method-only operators require special handling; pointers-to-method are different from function pointers. `ATEN_FN2(add_, Tensor)` returns a function instead of a method. There is some interesting behavior for out= operations. `ATEN_FN2(sin, "out")` gives a function that is faithful to the schema; that is, the order of arguments is exactly what it looks like in the schema. This makes it so that you can directly register `ATEN_FN2(sin,"out")` (or a function wrapping it using the same signature) as an override for a DispatchKey. Test Plan: - New tests that ATEN_FN2 works on function and method-only operators - New test that ATEN_FN works - New test that ATEN_FN macro returns a "faithful" function. Codegen output: Operators.h and Operators.cpp are both here: https://gist.github.com/zou3519/c2c6a900410b571f0d7d127019ca5175 Reviewed By: mruberry Differential Revision: D28643215 Pulled By: zou3519 fbshipit-source-id: 7b2b8459f1b2eb5ad01ee7b0d2bb77639f77940e	2021-05-26 07:29:15 -07:00
Alban Desmaison	032d6b0643	Revert D28112689: CUDA support in the CSR layout: constructors Test Plan: revert-hammer Differential Revision: D28112689 (`1416e57465`) Original commit changeset: f825cd4bce40 fbshipit-source-id: 421fc590797ac5fab6a55ac6f213361fbba7cd5b	2021-05-26 06:15:05 -07:00
CodemodService FBSourceClangFormatLinterBot	bbdc428db2	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28704311 fbshipit-source-id: f089266771c1ceba127116638a4dd87aa21e2e27	2021-05-26 03:19:49 -07:00
Chen Lai	9ba9a16700	[PyTorch Edge] Use stream as backport_vi_to_vi-1 interface (#58790 ) Summary: Two main changes: 1. Change the argument of the collection of backport_v{i}_to_v{i-1} from (reader, writer) to (input_model_stream, output_model_stream), so it's easier to backport a model in option 2. > 2) [Both format and content change] ]Use torch.jit.load() to load the stream, and save it to output_model_stream. 2. Fix an issue in the test `backportAllVersionCheck`. Previous it declares `std::ostringstream oss` and uses `oss.clear()` to reset the stringstream. However, the `clear()` function doesn't reset the stream content, and causes problematic stream. As a mitigation, checks are added to prevent corrupted stream for each iteration in while loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58790 ghstack-source-id: 129929960 Test Plan: CI ``` buck test mode/dev //caffe2/test/cpp/jit:jit ``` Reviewed By: raziel, iseeyuan Differential Revision: D28620961 fbshipit-source-id: b0cbe0e88645ae278eb3999e2a84800702b5f985	2021-05-26 02:07:46 -07:00
Alexander	1416e57465	CUDA support in the CSR layout: constructors (#57274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57274 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28112689 Pulled By: bhosmer fbshipit-source-id: f825cd4bce402dd4c3f71db88854f77830b687b8	2021-05-26 01:36:20 -07:00
Pearu Peterson	be4ba29d49	Detect overflow in numel of sparse COO tensor (#57492 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57492 Reviewed By: albanD Differential Revision: D28273649 Pulled By: mruberry fbshipit-source-id: 08ba50509556df1981d7ede025d84a836d2e8e5e	2021-05-25 22:16:21 -07:00
kshitij12345	948df6c7a9	[numpy] torch.i0: promote integer inputs to float (#52735 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52735 Reviewed By: zou3519 Differential Revision: D28630505 Pulled By: mruberry fbshipit-source-id: e81a35dfc1a322daf0c44718901470fac677bc94	2021-05-25 22:02:00 -07:00
kshitij12345	49c2da0ee0	[testing] improve broadcasts_input error message (#58295 ) Summary: Context: The Error message when `broadcasts_input` is marked incorrectly is uninformative [See Error Currently] https://github.com/pytorch/pytorch/pull/57941#discussion_r631749435 Error Currently ``` Traceback (most recent call last): File "/home/kshiteej/Pytorch/pytorch_i0_promotion/test/test_ops.py", line 326, in test_variant_consistency_eager _test_consistency_helper(samples, variants) File "/home/kshiteej/Pytorch/pytorch_i0_promotion/test/test_ops.py", line 310, in _test_consistency_helper variant_forward = variant(cloned, File "/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/unittest/case.py", line 227, in __exit__ self._raiseFailure("{} not raised".format(exc_name)) File "/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/unittest/case.py", line 164, in _raiseFailure raise self.test_case.failureException(msg) AssertionError: RuntimeError not raised ``` Error After PR ``` Traceback (most recent call last): File "/home/kshiteej/Pytorch/pytorch_i0_promotion/test/test_ops.py", line 329, in test_variant_consistency_eager _test_consistency_helper(samples, variants) File "/home/kshiteej/Pytorch/pytorch_i0_promotion/test/test_ops.py", line 313, in _test_consistency_helper variant_forward = variant(cloned, File "/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/unittest/case.py", line 227, in __exit__ self._raiseFailure("{} not raised".format(exc_name)) File "/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/unittest/case.py", line 164, in _raiseFailure raise self.test_case.failureException(msg) AssertionError: RuntimeError not raised : inplace variant either allowed resizing or you have marked the sample SampleInput(input=Tensor, args=(tensor([[[ 2.1750, -8.5027, -3.1403, -6.9942, 3.2609], [-2.5057, -5.9123, -5.4633, 6.1203, -8.2124], [-3.5802, -8.4869, -6.0700, 2.3431, -8.1955], [-7.3316, 1.3248, -6.8661, 7.1483, -8.0719], [ 4.5977, -4.0448, -6.2044, -2.1314, -8.4956]], [[ 3.2769, -8.4360, 1.2826, 7.1749, 4.7653], [-0.2816, -2.5997, -4.7659, -3.7814, 3.9704], [-2.1778, -3.8117, -6.0276, -0.8423, -5.9646], [ 8.6544, -3.0922, 0.2558, -4.9318, -4.7596], [ 4.5583, 4.3830, 5.8793, 0.9713, -2.1481]], [[-1.0447, 0.9334, 7.6405, -4.8933, -7.4010], [ 7.7168, -8.4266, -5.5980, -6.9368, 7.1309], [-8.7720, -5.0890, -0.4975, 1.9518, 1.7074], [-8.5783, 8.5510, -8.5459, -3.5451, 8.4319], [ 8.5052, -8.9149, -6.6298, -1.2750, -5.7367]], [[-6.5625, 8.2795, -4.9311, 1.9501, -7.1777], [-8.4035, 1.1136, -7.6418, -7.0726, -2.8281], [ 4.2668, -0.2883, -6.2246, 2.3396, 1.2911], [ 4.6550, -1.9525, 4.4873, -3.8061, -0.8653], [-3.4256, 4.4423, 8.2937, -5.3456, -4.2624]], [[ 7.6128, -6.3932, 4.7131, -5.4938, 6.4792], [-6.5385, 2.4385, 4.5570, 3.7803, -8.3281], [-2.9785, -4.4745, -1.1778, -8.9324, 1.3663], [ 3.7437, 3.5171, -6.3135, -8.4519, -2.7033], [-5.0568, -8.4630, -4.2870, -3.7284, -1.5238]]], device='cuda:0', dtype=torch.float32, requires_grad=True),), broadcasts_input=True) incorrectly with `broadcasts_self=True ``` NOTE: Printing the sample looks very verbose and it may be hard to figure out which sample is incorrectly configured if there are multiple samples with similar input shapes. Two Options to make this error less verbose * Don't print the sample and just print `inplace variant either allowed resizing or you have marked one of the sample incorrectly with broadcasts_self=True` * Have some mechanism to name samples which will be printed in the `repr` (which will need extra machinery) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58295 Reviewed By: ngimel Differential Revision: D28627308 Pulled By: mruberry fbshipit-source-id: b3bdeacac3cf9c0d984f0b85410ecce474291d20	2021-05-25 21:14:17 -07:00
Serhat Yilmaz	083d3bb93b	[torch][repeat_interlaeve] Add to exception list in backward compat check (#58966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58966 Same as title. Test Plan: CI since updated the check Reviewed By: ngimel Differential Revision: D28699577 fbshipit-source-id: 436fdc648a4c653081ff0e1b6b809c4af742055a	2021-05-25 20:04:50 -07:00
driazati	26c1f0f72e	[skip ci] Skip debug info on PRs (#58897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58897 We don't need to be building debug info on PRs since it's just filling up S3/CircleCI storage with useless 800 MB zips, this flips it so it's only run on master + release branches. See #58898 for CI signal Also see pytorch/builder counterpart (unlike the last debuginfo PR there is no hard dependency between these two so there won't be any churn on un-rebased PRs): https://github.com/pytorch/builder/pull/778 Test Plan: Imported from OSS Reviewed By: seemethere, samestep Differential Revision: D28689413 Pulled By: driazati fbshipit-source-id: 77a37e84afe492215008d5e023ceab0c24adb33c	2021-05-25 17:01:51 -07:00
Brian Hirsh	32273e806a	Ensure NativeFunctions.h codegen output is deterministic (#58889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58889 fixes https://github.com/pytorch/pytorch/issues/58796 Planning on re-testing locally tomorrow morning to confirm, but this change should fix the non-determinism in the codegen output that was causing `ccache` not to re-use its cached output. I built from the commit referenced in https://github.com/pytorch/pytorch/issues/58796 a few times and ran `diff -Naur` on the codegen output in `build/aten/src/ATen`. After a few tries, `NativeFunctions.h` had a few diffs. The diffs were all related to the ordering of functional/inplace/out variants of a NativeFunctionGroup, which looked non-deterministic. That looks like it's coming from my calling `set()` to filter out duplicate NativeFunction declarations. The earlier version of the codegen also called `set()` to filter out duplicates, but it did so individually for each `NativeFunction` object, before merging the groups (I'm not too sure why this didn't introduce non-determinism before. though). With the refactor from https://github.com/pytorch/pytorch/pull/57361, we're calling `set()` on the declarations from every operator for a given DispatchKey, which is probably what introduced the nondeterminism. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28675941 Pulled By: bdhirsh fbshipit-source-id: bb66de00aafeeb9720d85e8156ac9f7539aed0d6	2021-05-25 16:33:03 -07:00
Natalia Gimelshein	db5e5781ad	replace all remaining occurrences of deadline=1000, to prevent test flakiness Summary: Per title Test Plan: Fixes existing tests Reviewed By: robieta Differential Revision: D28690296 fbshipit-source-id: d7b5b5065517373b75d501872814c89b24ec8cfc	2021-05-25 15:55:30 -07:00
Chen Lai	60af6e928a	[PyTorch Edge][Version] Fix torchscript model after backport (#58892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58892 The torchscript model after backport misses the `constants` archive. Add it back, and extend the unit test to run torchscript part. ghstack-source-id: 129853819 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions' ``` Reviewed By: raziel, iseeyuan Differential Revision: D28664507 fbshipit-source-id: 5f98723231cc64ed203c062ee6f00d8adbdccf77	2021-05-25 15:36:56 -07:00
Peter Bell	fb120493b1	Make Scalar.to<> for invalid types a compile-time error (#58726 ) Summary: Currently calling `scalar.to<std::complex<double>>()` for example compiles but throws an error at runtime. Instead, marking the non-specialized cases as `= delete` means the code fails to compile and you catch the error sooner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58726 Reviewed By: zou3519, seemethere Differential Revision: D28646057 Pulled By: ezyang fbshipit-source-id: 9e4e3d1b4586eeecbb73db61bba56560b2657351	2021-05-25 15:34:01 -07:00
Joel Schlosser	36a77580f5	[docs] Clarify batch_first behavior for nn.LSTM, nn.RNN, and nn.GRU (#58809 ) Summary: Fixes the high-pri doc component of https://github.com/pytorch/pytorch/issues/4145. To make the input / output shapes more readable for both `batch_first` states, this PR also introduces short dim names. Opinions welcome on the readability of the restructured docs! Screenshot for `nn.LSTM`: <img width="791" alt="Screen Shot 2021-05-24 at 5 11 39 PM" src="https://user-images.githubusercontent.com/75754324/119408130-389e5300-bcb3-11eb-9a4f-1df96a0a4d70.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58809 Reviewed By: gchanan Differential Revision: D28685415 Pulled By: jbschlosser fbshipit-source-id: e8c92e3d7e052071a505b55dca976fd2ef5a8307	2021-05-25 15:27:17 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Natalia Gimelshein	45aa54d83c	relax test deadlines Summary: Relax test deadlines for c2 tests. We run on loaded machines, and timings are unreliable. Test Plan: Fixes existing tests Reviewed By: mruberry Differential Revision: D28690006 fbshipit-source-id: 457707e81a1ec92548c1f23ea7a0022fa0a3bfda	2021-05-25 15:02:52 -07:00
Corey Lammie	b4b95fc87a	Expose `cudaMemGetInfo` (#58635 ) Summary: This PR resolves the second issue outlined in https://github.com/pytorch/pytorch/issues/58376, which has previously been discussed in https://github.com/pytorch/pytorch/issues/50722. `cudaMemGetInfo` is bound/exposed to the Python API. An example function call is provided below: ``` device_free, device_total = torch.cuda.mem_get_info(torch.device('cuda:0')) print(device_free, device_total) ``` In `CUDACachingAllocator.cpp`, in constant to my initial PR, the newly defined function `std::pair<size_t, size_t> raw_cuda_mem_get_info(int device)` has been moved from the `CUDACaching` namespace to the `cuda` namespace. In addition, as suugested by ezyang, `det` has been removed from all function names. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58635 Reviewed By: zou3519 Differential Revision: D28649093 Pulled By: ezyang fbshipit-source-id: d8b7c53e52cf73f35495d8651863c5bb408d7a6a	2021-05-25 14:58:35 -07:00
Scott Wolchok	133133afa8	[PyTorch] Extract non-template parts of torch::class_ (#54548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54548 We don't need to inline most of this class; doing so bloats code size and build time. ghstack-source-id: 129765666 Test Plan: Existing CI buildsizebot some mobile apps Reviewed By: jamesr66a Differential Revision: D27277317 fbshipit-source-id: 7643aa35e4d794fee0a48a3bbe0890c2e428ae78	2021-05-25 14:47:00 -07:00
Eli Uriegas	ec89bf6535	.github: Ensure 7zip install for windows (#58924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58924 Was observing behavior where 7zip was nowhere to be found after a build was completed. Let's just have 7zip be installed within the workflow as well just to be completely sure 7zip is there. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28681241 Pulled By: seemethere fbshipit-source-id: f649c1713edcdeb82c84fd67866700caa2726d71	2021-05-25 13:58:35 -07:00
Kimish Patel	ede3f5421f	[Pytorch Delegated Backend] Save function name in debug info (#57481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57481 This diff introduces function name to InlinedCallStack. Since we are using InlinedCallStack for debug information in lite interpreter as well as delegate backends, where InlinedCallStack cannot be constructed from model source code, we need to save function name. In the absence of function name Function* is used to get name of the function. This is when JIT compiles code at runtime. When that is not possible, this diff introduces a way to obtain function name. Test Plan: test_backend test_cs_debug_info_serialization test_backend test_cs_debug_info_serialization Imported from OSS Differential Revision: D28159097 D28159097 Reviewed By: raziel, ZolotukhinM Pulled By: kimishpatel fbshipit-source-id: deacaea3325e27273f92ae96cf0cd0789bbd6e72	2021-05-25 13:19:02 -07:00
Kimish Patel	813adf1076	[Pytorch Delegated Backend] Save operator name and function name in (#57441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441 debug info Previous diffs did not save operator name in debug info. For delegated backends that only idenfity op for profiling with debug handle, operator name should be stores as well. Furthermore to complete debug informaton also serialize function name. Test Plan: Existing lite interpreter and backend tests Existing lite interpreter and backend tests Imported from OSS Differential Revision: D28144581 D28144581 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99	2021-05-25 13:17:54 -07:00
Jeffrey Wan	a7a5992d7d	Add no-grad inference mode note (#58513 ) Summary: Adds a note explaining the difference between several often conflated mechanisms in the autograd note Also adds a link to this note from the docs in `grad_mode` and `nn.module`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513 Reviewed By: gchanan Differential Revision: D28651129 Pulled By: soulitzer fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156	2021-05-25 13:06:54 -07:00
Ansley Ussery	5268b5a29a	Add parsing logic for `Tuple[()]` annotation (#58340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58340 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D28459502 Pulled By: ansley fbshipit-source-id: 4bb188448d66269b42b068858b895debac86e9ee	2021-05-25 12:12:43 -07:00
Kushashwa Ravi Shrimali	b9d1ad9c78	OpInfo: `diag_embed`, `diagonal` (#58642 ) Summary: See: https://github.com/pytorch/pytorch/issues/54261. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58642 Reviewed By: ngimel Differential Revision: D28627226 Pulled By: mruberry fbshipit-source-id: b96fa8410bd53937ddb72a46c02b949691ee9458	2021-05-25 11:52:53 -07:00
Akshit Khurana	f976275858	Run pthreadpool with _NoPThreadPoolGuard on the same thread (#58759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58759 * Makes `pthreadpool()->run` respect `_NoPThreadPoolGuard` Runs tasks on the same thread instead of parallelizing when guard is present Test Plan: buck build //xplat/caffe2:aten_test_test_thread_pool_guard ./buck-out/last/aten_test_test_thread_pool_guard Reviewed By: kimishpatel Differential Revision: D28597425 fbshipit-source-id: 0365ad9947c239f5b37ce682802d4d401b8b0a48	2021-05-25 11:39:05 -07:00
Raghavan Raman	b703f1e02d	[NNC] Add documentation for splitWith APIs (#58270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58270 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427226 Pulled By: navahgar fbshipit-source-id: 39635e985095c7b581452464d7a515c6f86b24e8	2021-05-25 11:32:53 -07:00
Raghavan Raman	dd7bbe1a63	[NNC] Make splitWithMask transform in-place (#58269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427227 Pulled By: navahgar fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba	2021-05-25 11:32:51 -07:00
Raghavan Raman	e2467cc43e	[NNC] Make splitWithTail transform in-place (#58268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427228 Pulled By: navahgar fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f	2021-05-25 11:31:14 -07:00
Zhengxu Chen	6b6a27e430	[jit] Add Python API for ScriptProfile (#57398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57398 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28133577 fbshipit-source-id: dcb8338159a24b00b5c495ecec66a3303d9b4aba	2021-05-25 11:09:18 -07:00
Xiang Gao	c88333484f	[resubmit] masked_scatter thrust->cub (#58865 ) Summary: See ae7760cf50bb2cddff4663a07b9d68decf4b6c75 for the fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/58865 Reviewed By: mruberry Differential Revision: D28657940 Pulled By: ngimel fbshipit-source-id: 9155c710b0e18ebb3bfa2dabfdd117355ac30840	2021-05-25 11:00:50 -07:00
Eddie Yan	fedf6f2db2	Check memory overlap in sort for large input sizes (#58327 ) Summary: The downstream cub sort doesn't support inplace sorting; this PR adds a check to bail out to allocating a new tensor instead of silently corrupting the returned indices. CC ngimel zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/58327 Reviewed By: mruberry Differential Revision: D28661244 Pulled By: ngimel fbshipit-source-id: 40617a7d3bfcebbe187bb706b6b753371bb99097	2021-05-25 10:57:31 -07:00
Masaki Kozuki	7eade660c6	[PyTorch] Reduce errors of `foreach` functions (#56993 ) Summary: This is based on https://github.com/pytorch/pytorch/issues/48224. To make `foreach` more flexible, this PR pushes unsupported cases to slow path. Also, this adds some tests to verify that - `foreach` functions work with tensors of different dtypes and/or memory layouts in `7bd4b2c89f` - `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: `def4b9b5a1` Future plans: 1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary\|pointwise\|minmax)_db`. 2. Support broadcasting in slow path. Ref: https://github.com/pytorch/pytorch/pull/52448 3. Support type promotion in fast path. Ref https://github.com/pytorch/pytorch/pull/52449 CC: ngimel mcarilli ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/56993 Reviewed By: zou3519 Differential Revision: D28630580 Pulled By: ngimel fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19	2021-05-25 10:50:20 -07:00
Natalia Gimelshein	8a28bbeeb9	various TensorIterator speed improvements (#58810 ) Summary: 1) remove pushing back to strides vector for 1D tensors, those strides are never used in the loop anyway 2) avoid calling get_data_ptrs unless necessary 3) don't call into assert_no_partial_overlap if tensorImpls are the same (assert_no_partial_overlap has this comparison too, but after a couple of nested function calls) 4) is_non_overlapping_and_dense instead of is_contiguous in memory overlap (which, for some reason, is faster than is_contiguous, though I hoped after is_contiguous is non-virtualized, it should be the same). Altogether, brings instruction count down from ~110K to 102735 for the following binary inplace benchmark: ``` In [2]: timer = Timer("m1.add_(b);", setup="at::Tensor m1=torch::empty({1}); at::Tensor b = torch::empty({1});", language="c++", timer=timeit.default_timer) ...: stats=timer.collect_callgrind(number=30, repeats=3) ...: print(stats[1].as_standardized().stats(inclusive=False)) ``` similar improvements for unary inplace. Upd: returned stride packing for now, counts is now 104295, so packing is worth ~ 52 instructions, we should think about how to remove it safely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58810 Reviewed By: bhosmer Differential Revision: D28664514 Pulled By: ngimel fbshipit-source-id: 2e03cf90b37a411d9994a7607402645f1d8f3c93	2021-05-25 10:44:51 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Will Constable	bf269fdc98	Re-enable torchdeploy oss tests and move to per-PR cuda11 job (#58872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58872 Test Plan: verify tests running on CI as expected Reviewed By: suo Differential Revision: D28646660 fbshipit-source-id: eb7d784844fb7bc447b4232e2f1e479d4d5aa72f	2021-05-25 10:05:32 -07:00
Rohan Varma	19bcbfc5cf	[c10d] Use pg wrapper in detailed debug mode (#58281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58281 When TORCH_DISTRIBUTED_DEBUG=DETAIL is enabled, this PR causes process groups created by `new_group` and `init_process_group` that are nccl or gloo to be wrapped in `ProcessGroupWrapper`. As a result, the user will get back a `ProcessGroupWrapper` that they can use in the exact same way as a regular nccl/gloo pg, but will be more helpful in terms of debugging desync/hangs. Besides doing collective desync checks, which should be transparent if there are indeed no issues in the user application, there are no semantic differences in using the wrapper pg. Note that there is a performance implication here but that is a tradeoff we are making when DETAIL debug mode is enabled. Open to suggestions on how to test better. Currently I verified locally that enabling TORCH_DISTRIBUTED_DEBUG=detail creates the wrapper and all tests still pass, but that doesn't run in CI. On the other hand testing everything with debug=detail and the regular tests might be too much, so we have only added it to a few tests for now. We also do have tests in the below diff. ghstack-source-id: 129817857 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D28402301 fbshipit-source-id: c4d3438320f6f0986e128c738c9d4a87bbb6eede	2021-05-25 09:55:52 -07:00
Nikita Shulga	aad2ad883a	Disable test_nccl_errors_blocking_abort (#58921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58921 Reviewed By: ezyang Differential Revision: D28680061 Pulled By: malfet fbshipit-source-id: bab4a28f054ed26bcd6431576b60268ba4db8e6b	2021-05-25 09:50:24 -07:00
Kimish Patel	470160ad78	[Pytorch] Update FuseLinear to map source range information (#58492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58492 Update graph rewrite to specify how values in replacement pattern should map to values in original pattern for fuse_linear pass (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizeJitPasses.test_fuse_linear Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28512464 fbshipit-source-id: 250a69cebc11eb4328a34c8f685b36e337439aae	2021-05-25 09:19:57 -07:00
Kimish Patel	e067675167	[Pytorch] Provide API to preserve source range and callstack information during graph rewrite (#58300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58300 Current state: During graph rewriting that can fuse nodes or add nodes result in new nodes without debug information that was available in original node. Thus we lose this information during graph rewrite. This PR changes graph rewriting API to let user specify how the values in the replacement pattern map to values in the pattern to be matched. Then the graph rewriting will copy source range and inlined callstack from the matched nodes onto the nodes being inserted. (Note: this ignores all push blocking failures!) Test Plan: python test/test_jit.py TestJit.test_pattern_based_rewrite_with_source_range_preserved Imported from OSS Reviewed By: malfet Differential Revision: D28512465 fbshipit-source-id: 863173c29de726be85b3acbd3ddf3257eea36d13	2021-05-25 09:18:59 -07:00
Thomas Viehmann	2ef9a1df22	Increase mimimum number of warmup runs to 2 (#58801 ) Summary: The JIT will typically need two warmup runs to do profiling and optimization. This is not the perfect solution but it will substantially reduce the number of surprised people when the docs say torch.utils.benchmark.Timer takes care of warmup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58801 Reviewed By: desertfire Differential Revision: D28644244 Pulled By: robieta fbshipit-source-id: cc54ed019e882a379d6e4a0c6a01fd5873dd41c3	2021-05-25 08:38:52 -07:00
albanD	09a1b1cf87	Forward AD formulas batch 1 (#57768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57768 Note that this PR implements formulas only for ops that are supported by OpInfo. Test Plan: Imported from OSS Reviewed By: zou3519, malfet Differential Revision: D28387766 Pulled By: albanD fbshipit-source-id: b4ba1cf1ac1dfd46cdd889385c9c2d5df3cf7a71	2021-05-25 07:29:25 -07:00
Serhat Yilmaz	b4f3a989da	[torch][repeat_interleave] Fix ambigious function call (#58881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58881 recently added new parameter to the function with PR: https://github.com/pytorch/pytorch/pull/58417 However, this introduced ambiguity when making call below: some_tensor.repeat_interleave(some_integer_value) Making it optional to avoid the issue. Reviewed By: ezyang, ngimel Differential Revision: D28653820 fbshipit-source-id: 5bc0b1f326f069ff505554b51e3b24d60e69c843	2021-05-25 00:31:32 -07:00
Freey0	3dbfaddfa1	Port elu_backward to structured (#58660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58660 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572528 Pulled By: ezyang fbshipit-source-id: 12265c287f178f9435d5d96f3bba49082d9e7f2c	2021-05-25 00:14:13 -07:00
Freey0	5850553bc0	Port hardsigmoid_backward to strucutred (#58484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58484 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572529 Pulled By: ezyang fbshipit-source-id: aee125aa59a1f2b1ddb0c29a287097d866121379	2021-05-25 00:14:12 -07:00
Freey0	3f0b7e0feb	Port leaky_relu_backward to structured (#58483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58483 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572526 Pulled By: ezyang fbshipit-source-id: a73bdf06967687dbb1d4fbb0f2ca80115db57a07	2021-05-25 00:14:10 -07:00
Freey0	ad27513430	Port softplus to structured (#58482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58482 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28571059 Pulled By: ezyang fbshipit-source-id: a1065294f3c459e7c99aaed9edb09f88705f58e9	2021-05-25 00:12:57 -07:00
Meghan Lele	0b8931fe4b	[torch][JIT] Predicate uses of RPC APIs on `torch.distributed.rpc.is_available()` (#58887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58887 There are some callsites of `torch.distributed.rpc.XXX` APIs that are compiled or not based on `USE_RPC`. However, `torch::deploy`, at least for now, is compiled with `USE_RPC=1`, but the `torch.distributed.rpc.XXX` APIs used by the aforementioned pieces of code are not available (i.e. `torch.distributed.rpc.is_available()` returns `False`). This can cause Torchscript compilation to fail, even if the code being compiled doesn't use RPC. This commit fixes this problem (at least temporarily) by predicating the use all thse `torch.distributed.rpc` APIs on the value of `torch.distributed.rpc.is_available()`. Test Plan: Ran packaged XLM-R model with C++ benchmark. Reviewed By: suo Differential Revision: D28660925 fbshipit-source-id: fbff7c7ef9596549105e79f702987a53b04ba6f9	2021-05-24 21:53:53 -07:00
Edward Yang	c502f49535	Fix failing torch deploy tests and reenable. (#58871 ) Summary: Fix is simple; alias inputs before feeding them to distinct torchdeploy interpreters. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Fixes https://github.com/pytorch/pytorch/issues/58832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58871 Reviewed By: wconstab, zou3519 Differential Revision: D28646784 Pulled By: ezyang fbshipit-source-id: 6d2850f3226b5b99468d1465723b421ce4d7ab89	2021-05-24 20:27:41 -07:00
Rohan Varma	cf395c0718	[c10d] Introduce ProcessGroupWrapper (#58224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58224 Adds C++ implementation of ProcessGroupWrapper. It wraps an underlying ProcessGroup and does debug checks before dispatching the collective to the underlying pg. The design mostly follows https://github.com/pytorch/pytorch/issues/22071. Concretely, on each collective, we: 1. Verify op type consistency. This can help catch mismatched ops in the user application (i.e. allreduce on one rank and allgather on another) 2. Verify tensor shapes. This can help catch bugs where the tensor inputs are malformed, whereas normally in NCCL this would just lead to a hang. The shapes verification for allgather/allreduce_coalesced is omitted because they actually accept different shape tensors and don't error out. This is done through an abstraction called `CollectiveFingerPrint` which uses a helper process group to do the above verification. Concretely, we gather the data we need for each of the above checks into tensors, and allgather them, and verify their equivalence. Once all of this passes we simply dispatch the collective to the underlying pg. Added `ProcessGroupWrapperTest` in python to comprehensively test these changes. ghstack-source-id: 129735687 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D28023981 fbshipit-source-id: 1defc203c5efa72ca0476ade0d1d8d05aacd4e64	2021-05-24 20:09:51 -07:00
Hao Lu	c00eefb6c7	[Static Runtime] Clean up and fix bugs in Static Runtime (#58829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58829 - Delete copying and moving of MemoryPlanner. - Remove `inline` in some of the member functions because member functions implemented in classes are inline by default. - Clean up ad update comments. - Reorganize some code Reviewed By: edvgha Differential Revision: D28555476 fbshipit-source-id: 7ea8efc0e2ed93a6788a742470b9e753a85df677	2021-05-24 19:46:58 -07:00
KAI ZHAO	de845020a0	fix docstring for fusing functions (#58638 ) Summary: This PR fixes docstrings of fusing functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58638 Reviewed By: H-Huang Differential Revision: D28584501 Pulled By: jerryzh168 fbshipit-source-id: 77a53a709d968df8ba8f5b613ad7cf225ba2826a	2021-05-24 18:27:22 -07:00
Zhengxu Chen	2b0ec9c3cf	Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783 This reverts commit fc804b5def5e7d7ecad24c4d1ca4ac575e588ae8. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28617037 Pulled By: zhxchen17 fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14	2021-05-24 18:23:21 -07:00
Scott Wolchok	705dd9ffac	[PyTorch] Migrate remaining stray uses of TI:add_output to borrowing (#58605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58605 Found a few more by grepping. ghstack-source-id: 129730281 Test Plan: CI Reviewed By: ngimel Differential Revision: D28523254 fbshipit-source-id: 317baea88885586c5106c8335ebde0d8802a3532	2021-05-24 17:34:54 -07:00
Pritam Damania	12bb1e86ed	Make c10::ThreadPool::available_ atomic. (#58457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58457 This variable had concurrent read/write access without any synchronization. The issue was caught and reported by TSAN. ghstack-source-id: 129311384 Test Plan: 1) Verify test locally. 2) waitforbuildbot. Reviewed By: ezyang Differential Revision: D28498116 fbshipit-source-id: 89af068467fed64c131d743504c0cecf3017d638	2021-05-24 17:29:44 -07:00
Angela Yi	a5250425e0	[quant] Eager mode equalization support for ConvReLU and LinearReLU (#58792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58792 Enabling support for fused modules like ConvReLU or LinearReLU on eager mode cross-layer equalization. Test Plan: `python test/test_quantization.py TestEqualizeEager` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28647242 fbshipit-source-id: 286e057ce70aa7de45d575afd6c13e55120ff18a	2021-05-24 17:25:13 -07:00
Yi Wang	b593dd2027	[Gradient Compression] Re-enable test_ddp_hook_parity_powerSGD on Gloo backend (#58882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58882 Re-enable this test since https://github.com/facebookincubator/gloo/pull/309 is already picked up by Gloo submodule. ghstack-source-id: 129760436 Test Plan: waitforbuildbot Reviewed By: agolynski Differential Revision: D28654433 fbshipit-source-id: dfc002936e88c074be529d6024f889214130b1b9	2021-05-24 16:52:54 -07:00
davidriazati@fb.com	a566005679	[skip ci] Update readme to use hud.pytorch.org (#58835 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58835 Pulled By: davidriazati driazati Reviewed By: seemethere Differential Revision: D28632504 fbshipit-source-id: 867f061be039bc63c1478b1b1eed8c0380e94faa	2021-05-24 15:02:18 -07:00
Jerry Zhang	f29e75c4dc	[reland][quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer (#58455 ) (#58756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58756 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Imported from OSS Reviewed By: supriyar Differential Revision: D28607564 fbshipit-source-id: 979cf165941bb3a9044d03077a170b5ea64dc36a	2021-05-24 14:57:45 -07:00
Zuo Zongyuan	76f03bc42f	Fix `torch.finfo.bits` typo in stub (#58819 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58818. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58819 Reviewed By: walterddr, malfet Differential Revision: D28641587 Pulled By: ezyang fbshipit-source-id: b19b519db43f2075c64f4f9ba922310f2561ca70	2021-05-24 14:52:49 -07:00
Alexander Golynski	bc2ee078d1	Update Gloo submodule (#58853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58853 Reviewed By: pbelevich, SciPioneer Differential Revision: D28642642 Pulled By: agolynski fbshipit-source-id: 8c31f9ab86c5f3063733199474022e7e2c6e9a2f	2021-05-24 14:23:41 -07:00
Stephen Jia	51b7224f8f	[vulkan] Add max_pool2d op (#58806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58806 Adds the max_pool2d op to Vulkan. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28625049 fbshipit-source-id: 75c82a84f0eca51627e33a6182ef51cb7e82e068	2021-05-24 14:16:19 -07:00
driazati	a679bb5ecf	Refactor local lint (#58798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58798 In #58623 there was a bug in `make quicklint` where ShellCheck would run on the entire repo when there were no files. This PR fixes that by refactoring out common stuff (like skipping quicklint when there are no files, let checks do their own file filtering) and pushes the logic into a runner class. Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28649889 Pulled By: driazati fbshipit-source-id: b19f32cdb63396c806cb689b2f6daf97e1724d44	2021-05-24 13:52:53 -07:00
Thomas J. Fan	a7f4f80903	ENH Adds dtype to nn.functional.one_hot (#58090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33046 Related to https://github.com/pytorch/pytorch/issues/53785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58090 Reviewed By: zou3519 Differential Revision: D28640893 Pulled By: jbschlosser fbshipit-source-id: 3686579517ccc75beaa74f0f6d167f5e40a83fd2	2021-05-24 13:48:25 -07:00
Natalia Gimelshein	e4be80c1b8	simplify cpu_kernel to not have contiguous special case (#58830 ) Summary: Per title `unroll_contiguous_scalar_checks` tries to verify that all arguments (including outputs) are contiguous except maybe 1 scalar (with stride 0). Then it calls the passed lambda with index of the scalar arg if this verification succeeded, or 0 if args were not contiguous/there was no scalar. Depending on the value of this index (with 0=not found) a different function can be called (in vectorized kernels it’s vectorized loop if args are contiguous + scalar, and basic loop if not). It makes sense for vectorized kernel (vectorized loop can still be used in some broadcasted cases), but all other (cpu_kernel, serial_cpu_kernel, cpu_kernel_multiple_outputs) don’t even use idx argument in lambda, so regardless of what `unroll_contiguous_scalar_checks` does, they'll do the same thing. No point in calling `unroll_contiguous_scalar_checks` then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58830 Reviewed By: zou3519, mruberry Differential Revision: D28632668 Pulled By: ngimel fbshipit-source-id: c6db3675933184e17cc249351c4f170b45d28865	2021-05-24 12:07:29 -07:00
Jacob Szwejbka	1c5f63d86d	[Pytorch Edge] Model Ops compatibility api (#57501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57501 Add an api _get_model_ops_and_info to get root operators and versioning info of a model in both cxx and python, and the input can be from a file path or buffer. ghstack-source-id: 129620112 Test Plan: unit test. Reviewed By: xcheng16, raziel Differential Revision: D28162765 fbshipit-source-id: 4413c1e906b8a872e4a717d849da37347adbbea4	2021-05-24 12:00:06 -07:00
driazati	2a456e4f49	[skip ci] Move debug wheels out of package dir before test (#58685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58685 This moves debug packages out of the artifacts dir before running tests (as a counterpart to https://github.com/pytorch/builder/pull/770). Doing it this way allows us to keep the CI configs simple since there's one directory to use for artifacts / upload to S3. See #58684 for actual CI signals (the ones on this PR are all cancelled since it depends on the builder branch set in the next PR up the stack) Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D28646995 Pulled By: driazati fbshipit-source-id: 965265861968906770a6e6eeecfe7c9458631b5a	2021-05-24 11:46:37 -07:00
Yanli Zhao	2733555ed1	replace all_gather with more efficient collective api _all_gather_base (#57769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57769 _all_gather_base saved copies in all_gather, so it is more efficient Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D28227193 fbshipit-source-id: ddd8590095a5b45676497a71ed792a457f9825c6	2021-05-24 11:34:45 -07:00
Joel Schlosser	c58709b7bb	Helper function for skipping module parameter / buffer initialization (#57555 ) Summary: This PR introduces a helper function named `torch.nn.utils.skip_init()` that accepts a module class object + `args` / `kwargs` and instantiates the module while skipping initialization of parameter / buffer values. See discussion at https://github.com/pytorch/pytorch/issues/29523 for more context. Example usage: ```python import torch m = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1) print(m.weight) m2 = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1, device='cuda') print(m2.weight) m3 = torch.nn.utils.skip_init(torch.nn.Linear, in_features=5, out_features=1) print(m3.weight) ``` ``` Parameter containing: tensor([[-3.3011e+28, 4.5915e-41, -3.3009e+28, 4.5915e-41, 0.0000e+00]], requires_grad=True) Parameter containing: tensor([[-2.5339e+27, 4.5915e-41, -2.5367e+27, 4.5915e-41, 0.0000e+00]], device='cuda:0', requires_grad=True) Parameter containing: tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]], requires_grad=True) ``` Bikeshedding on the name / namespace is welcome, as well as comments on the design itself - just wanted to get something out there for discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57555 Reviewed By: zou3519 Differential Revision: D28640613 Pulled By: jbschlosser fbshipit-source-id: 5654f2e5af5530425ab7a9e357b6ba0d807e967f	2021-05-24 11:28:32 -07:00
Rong Rong (AI Infra)	277f587496	rename benchmark_cpp_extension (#58708 ) Summary: Currently the cpp_extension build in benchmarks is misleading as it has the same name with torch.utils.cpp_extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/58708 Test Plan: Run from `./benchmarks/operator_benchmark/pt_extension` folder: ``` python setup.py install python cpp_extension_test.py ``` Note: CI doesn't matter as currently benchmarks/ folder is not compiled/test against CI Reviewed By: robieta Differential Revision: D28585582 Pulled By: walterddr fbshipit-source-id: fc071040cf3cb52ee6c9252b2c5a0c3043393f57	2021-05-24 11:04:02 -07:00
Eli Uriegas	a083933d2a	.github: Add windows.8xlarge.nvidia.gpu (#58781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58781 Adds windows GPU workers to our GHA self hosted infra [skip ci] Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28645532 Pulled By: seemethere fbshipit-source-id: b00d0caef727c597ee15d19c76bda384231f42c9	2021-05-24 10:40:46 -07:00
Eli Uriegas	8ae4d07dac	.circleci: Disable windows CPU builds for CircleCI (#58855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58855 We have successfully migrated windows CPU builds to Github Actions so let's go ahead and disable them in CircleCI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D28642875 Pulled By: seemethere fbshipit-source-id: 8ffe9338e58952531a70002891a19ea33363d958	2021-05-24 10:28:41 -07:00
Alexander	1fca1545d4	fixing csr addmm bug (#58768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58768 Fixes gh-58757 This PR has a fix for CPU version of addmm op. Just for context, before this PR, only CSR @ vector was supported. I found out a minor bug in the addmm_out_sparse_csr_dense_cpu for the non MKL code which is solved in this PR. Moreover, I discovered a limitation in the current MKL implementation. It only works well (acceptable tolerance for output error) with square matrices. I was looking in deep to this issue and I found out that it could be a limitation of the MKL API. I used this [gist code](https://gist.github.com/aocsa/0606e833cd16a8bfb7d37a5fbb3a5b14) based on [this](https://github.com/baidu-research/DeepBench/blob/master/code/intel/spmm/spmm_bench.cpp) to test this behavior. As you can see there is not an acceptable output error (last column) when the matrices are squares and there is a not acceptable error when the matrices are not square. I reported the issue here: https://github.com/pytorch/pytorch/issues/58770 Looking forward to your comments. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28629563 Pulled By: malfet fbshipit-source-id: 5ee00ae667336e0d9301e5117057213f472cbc86	2021-05-24 09:54:07 -07:00
Nikita Shulga	2dda8d7571	Move cublas dependency after CuDNN (#58287 ) Summary: Library linking order matters during static linking Not sure whether its a bug or a feature, but if cublas is reference before CuDNN, it will be partially statically linked into the library, even if it is not used Pull Request resolved: https://github.com/pytorch/pytorch/pull/58287 Reviewed By: janeyx99 Differential Revision: D28433165 Pulled By: malfet fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c	2021-05-24 09:39:09 -07:00
Eli Uriegas	bb4770462f	.github: Enable Windows workflow for pull_request (#58418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58418 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28483418 Pulled By: seemethere fbshipit-source-id: c9f5a4df5a308e0ac6fc6fdc1a26d04723ffded7	2021-05-24 09:34:47 -07:00
Gordon Fossum	007fe949aa	Adding a new include directory in BLIS search path (#58166 ) Summary: While trying to build PyTorch with BLIS as the backend library, we found a build issue due to some missing include files. This was caused by a missing directory in the search path. This patch adds that path in FindBLIS.cmake. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58166 Reviewed By: zou3519 Differential Revision: D28640460 Pulled By: malfet fbshipit-source-id: d0cd3a680718a0a45788c46a502871b88fbadd52	2021-05-24 08:57:02 -07:00
Erjia Guan	0e16087064	[DataLoader] Fix bugs for typing (#58450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58450 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28507877 Pulled By: ejguan fbshipit-source-id: f4051ff51ce77ef45214f11cba10c8a7e1da4dad	2021-05-24 07:14:40 -07:00
Facebook Community Bot	5c7dace309	Automated submodule update: FBGEMM (#58161 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `4b8aaad426` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58161 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D28385619 fbshipit-source-id: ace938b1e43760b4bedd596ebbd355168a8706b7	2021-05-23 23:33:19 -07:00
Yu Guo	74c12da451	add deterministic path for scatter_add_cuda for 1D tensors (#58761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58761 previously we implemented deterministic path for gather_backward in https://github.com/pytorch/pytorch/pull/55573, which replaced non-deterministic scatter_add_cuda. It's better to move it inside scatter_add so scatter_add can benefit from the deterministic path. Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_scatter_add_one_dim_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.063) ✓ Pass: caffe2/test:torch_cuda - test_scatter_add_one_dim_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (30.909) ✓ Pass: caffe2/test:torch_cuda - main (30.909) Summary Pass: 2 ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_gather_backward ✓ ListingSuccess: caffe2/test:torch_cuda - main (4.613) ✓ Pass: caffe2/test:torch_cuda - test_gather_backward_deterministic_path_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.369) buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.356) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146) ✓ Pass: caffe2/test:torch_cuda - main (28.146) Summary Pass: 30 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D28585659 fbshipit-source-id: 1ad003d4130501ceff5f6a7a870ca3dbc9a3f1f2	2021-05-23 21:36:02 -07:00
Michael Suo	50ded095e4	[deploy] temporarily disable deploy tests (#58832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58832 While we investigate breakage. Differential Revision: D28631469 D28631469 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: suo fbshipit-source-id: 43d51c1c9d81e951074824ccf624e42f6bec4242	2021-05-23 19:26:06 -07:00
Yukio Siraichi	a7fdd487e5	Port `kthvalue` tests to `OpInfo` (#58654 ) Summary: Tracking issue https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58654 Reviewed By: ngimel Differential Revision: D28627207 Pulled By: mruberry fbshipit-source-id: f662f178ab87a9d461f1e0c91b02942c64125e73	2021-05-23 16:44:16 -07:00
Pritam Damania	4709fdb117	Add GenericShardingSpec for generic tensor sharding. (#57409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57409 Full design: https://github.com/pytorch/pytorch/issues/55207 In https://github.com/pytorch/pytorch/issues/55207, we proposed `MeshShardingSpec` as a generic sharding mechanism. However, that proposal does not provide the flexibility to specify shards which have uneven sizes/partitions and assumes even partitioning. Uneven partitioning is one of the requirements of an internal use case. As a result, instead of that we introduce a `GenericShardingSpec` which allows specifying any arbitrary partitioning of a multi dimensional tensor. Basically it specifies the start offsets of each shard and the length of each dim of the shard allowing for greater flexibility ghstack-source-id: 129604155 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28137616 fbshipit-source-id: 61255762485fb8fa3ec3a43c27bbb222ca25abff	2021-05-23 16:06:05 -07:00
Pritam Damania	0d6fa1adc5	Introduce ChunkShardingSpec as a model sharding specification. (#55728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728 Full design: https://github.com/pytorch/pytorch/issues/55207 This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms of how a Tensor is split up and feels more clear compared to SingleShardingSpec. ghstack-source-id: 129603318 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27694108 fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49	2021-05-23 16:04:57 -07:00
Winston Smith	c5a1f04367	Enabled BFloat16 support for cumsum, logcumsumexp, cumprod, cummin & cummax on CUDA (#57904 ) Summary: Enabled BFloat16 support for `cumsum`, `logcumsumexp`, `cumprod`, `cummin` & `cummax` on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/57904 Reviewed By: ailzhang Differential Revision: D28558722 Pulled By: ngimel fbshipit-source-id: 2a8e49c271e968f841d24534b6cc7be162d3a5aa	2021-05-23 15:51:23 -07:00
kshitij12345	ee3ea31f12	OpInfo: split, split_with_sizes (#58184 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58184 Reviewed By: ngimel Differential Revision: D28627271 Pulled By: mruberry fbshipit-source-id: e6c0d2b005904ddebc9dab76685403530a6f6519	2021-05-23 15:47:35 -07:00
Kyle Chen	52a8031e8c	[ROCm] disable test test_Conv2d_groups_nobias_v2 for ROCm (#58701 ) Summary: Disable test_Conv2d_groups_nobias_v2 test because it is failing on ROCm 4.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58701 Reviewed By: ngimel Differential Revision: D28626651 Pulled By: mruberry fbshipit-source-id: a74bdf45335ae2afee0aa5e3bece6e208e75a63f	2021-05-23 15:43:36 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fa0b89bbf7	Change list striding kernel implementation to handle optional integers (#58536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58536 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28531720 Pulled By: tugsbayasgalan fbshipit-source-id: c06a8933aa9b4aa562ea65ac2558353b05d0f624	2021-05-23 12:34:22 -07:00
Yi Wang	28840b9a44	[Gradient Compression] Disable test_ddp_hook_parity_powerSGD on Gloo backend (#58802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58802 This test can only be re-enabled once https://github.com/facebookincubator/gloo/pull/309 is picked up by Gloo submodule. ghstack-source-id: 129661729 Test Plan: unit test. Reviewed By: rohan-varma Differential Revision: D28623214 fbshipit-source-id: 0249ae816469c3e8cabd08db415821091a064d58	2021-05-22 23:41:27 -07:00
Serhat Yilmaz	4ca4640bae	[torch][repeat_interleave] remove stream syncronization if output size is given (#58417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58417 Same as title. Test Plan: Rely on CI signal. Update unit test to exercise new code path as well. Reviewed By: ngimel Differential Revision: D28482927 fbshipit-source-id: 3ec8682810ed5c8547b1e8d3869924480ce63dcd	2021-05-22 20:53:28 -07:00
Rong Rong (AI Infra)	c1c9be16c4	port mm to structure kernel (#57755 ) Summary: relate to https://github.com/pytorch/pytorch/issues/57417. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57755 Reviewed By: ezyang Differential Revision: D28426111 Pulled By: walterddr fbshipit-source-id: 943d3e36433ca846990b940177fb040553961156	2021-05-22 19:24:14 -07:00
kshitij12345	f9e8dc005a	OpInfo: clone, contiguous (#58390 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58390 Reviewed By: soulitzer Differential Revision: D28567821 Pulled By: mruberry fbshipit-source-id: bcf42cb4a9a57d8a15a76819b8a9e2df97cf00be	2021-05-22 18:25:31 -07:00
Rong Rong (AI Infra)	a70020465b	adding test_sparse_csr to run_test (#58666 ) Summary: fixes https://github.com/pytorch/pytorch/issues/58632. Added several skips that relates to test assert and MKL. Will address them in separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666 Reviewed By: seemethere, janeyx99 Differential Revision: D28607966 Pulled By: walterddr fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965	2021-05-22 13:17:46 -07:00
Wenlei Xie	22776f0857	[PyTorch] Remove device check from a few indexing methods (#58800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58800 These methods leverages TensorIterator which will handle (or skip) device check. ghstack-source-id: 129654358 Test Plan: CI && sandcastle Reviewed By: ngimel Differential Revision: D28622626 fbshipit-source-id: 6153299780d4f7bf286423520ba4cb60b554335e	2021-05-22 13:13:39 -07:00
Kimish Patel	796c97a88f	[Pytorch Delegated Backend] Add python binding for (#57156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57156 generate_debug_handles To be able to generate debug handles for preprocess written inpython. Test Plan: CI CI Imported from OSS Differential Revision: D28062328 D28062328 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 8795d089edc00a292a2221cfe80bbc671468055c	2021-05-22 08:34:19 -07:00
Kimish Patel	d6d726f781	[Pytorch Backend delegation] Add api for backend lowering to query debug (#55462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462 handles and symbolicate exception callstack thrown from backend. Objective of this diff is to achieve improve error reporting when exceptions are raised from lowered backend. We would effectively like to get the same model level stack trace that you would get without having lowered some module to backend. For example: ``` class AA(nn.Module): def forward(self, x, y): return x + y class A(nn.Module): def __init__(...): self.AA0 = AA() def forward(self, x, y): return self.AA0.forward(x, y) + 3 class B(nn.Module): def forward(self, x): return x + 2 class C(nn.Module): def __init__(...): self.A0 = A() self.B0 = B() def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ``` If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we will likely see error stack like: ``` C++ exception with description "The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "<string>", line 3, in forward def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` We would like to see the same error stack if we lowered C.A0 to some backend. With this diff we get something like: ``` Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) Traceback of TorchScript (most recent call last): File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE assert isinstance(_0, Tensor) return _0 File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` This is achieved in 3 parts: Part 1: A. BackendDebugInfoRecorder: During backend lowering, in `to_backend`, before calling the preprocess function corresponding to the backend. This will facilitate recording of debug info (such as source range + inlined callstack) for the lowered module. B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder. This initializes thread local pointer to BackendDebugInfoRecorder. C. generate_debug_handles: In preprocess function, the backend will call generate_debug_handles for each method being lowered separately. generate_debug_handles takes `Graph` of the method being lowered and returns a map of Node*-to-debug_handles. Backend is responsible for storing debug handles appropriately so as to raise exception (and later profiling) using debug handles when the exception being raised corresponds to particular Node that was lowered. Inside generate_debug_handles, we will query the current BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug handle manager will issue debug handles as well as record debug_handles-to-<source range, inlined callstack> map. D. Back in `to_backend`, once the preprocess function is has finished lowering the module, we will call `stopRecord` on BackendDebugInfoRecorder. This will return the debug info map. This debug info is then stored inside the lowered module. Part 2: Serialization: During serialization for bytecode (lite interpreter), we will do two things: 1. Extract all the source ranges that are contained inside debug_handles-to-<source range, inlined callstack> map for lowered module. This will be source range corresponding to debug handles, including what is there is inlined callstack. Since we replaced original module with lowered module, we wont be serializing code for the original module and thus no source range. That is why the source range will have to be stored separately. We will lump all the source ranges for all the lowered modules in one single debug_pkl file. 2. Then we will serialize debug_handles-to-<source range, inlined callstack> map. Now during deserialization we will be able to reconstruct debug_handles-to-<source range, inlined callstack> map. Given all debug_handles are unique we would not need any module information. Test Plan: Tests are added in test_backend.cpp Tests are added in test_backend.cpp Imported from OSS Differential Revision: D27621330 D27621330 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890	2021-05-22 08:33:07 -07:00
Rong Rong (AI Infra)	e7c35a3363	Revert D28617214: [Gradient Compression] Do not skip the comm hook tests on Gloo backend Test Plan: revert-hammer Differential Revision: D28617214 (`3e88acbf05`) Original commit changeset: 3bafb0c837a1 fbshipit-source-id: 0b6254e9766436633faea63cd64c454b739f74b4	2021-05-22 07:47:18 -07:00
Horace He	6093161158	Separated out working tests from not working tests for NNC OpInfo (#58788 ) Summary: This gets rid of a lot of the try/else rigamarole. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58788 Reviewed By: ZolotukhinM Differential Revision: D28621054 Pulled By: Chillee fbshipit-source-id: d0d8a1b6466eb318d939a1ed172b78f492ee0d5b	2021-05-22 02:24:23 -07:00
Dhruv Matani	dc8bc6ba4b	[PyTorch Edge] Check if open paren ( occurs in an operator name string (#58687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58687 We want to validate if the usages are all okay. ghstack-source-id: 129639560 Test Plan: Tested on master. Build fails. The tested with D28549578 (`db67699ae6`) applied, and the build succeeds. Reviewed By: JacobSzwejbka Differential Revision: D28579734 fbshipit-source-id: 1ac65474762855562109adc0bac2897b59f637ce	2021-05-21 20:23:42 -07:00
Xiaodong Wang	4c961beacb	Revert D28474878: Always use intrusive_ptr for Message (1 out of 2) Test Plan: revert-hammer Differential Revision: D28474878 (`4d704e607d`) Original commit changeset: 5b76d45e05f6 fbshipit-source-id: 677c5bc7f02dca23213f778eb0e626a2f6600f3b	2021-05-21 19:24:22 -07:00
Xiaodong Wang	a6b9268f31	Revert D28474879: Always use intrusive_ptr for Message (2 out of 2) Test Plan: revert-hammer Differential Revision: D28474879 (`ebf55a7d13`) Original commit changeset: 498652a8b80a fbshipit-source-id: 4d81e9769699356bf2a2ffc14b26f480bfeef9a1	2021-05-21 19:24:20 -07:00
Xiaodong Wang	c1a9befba2	Revert D28474880: Allow Future::then to return pre-extracted DataPtrs Test Plan: revert-hammer Differential Revision: D28474880 (`a0ee299d92`) Original commit changeset: 91a0dde5e29d fbshipit-source-id: fabf7b0bcbd41342553660a4d1e4bfc3d1dd2d41	2021-05-21 19:24:19 -07:00
Xiaodong Wang	a1719be07f	Revert D28474877: Provide pre-extracted DataPtrs when completing a Future with a Message Test Plan: revert-hammer Differential Revision: D28474877 (`bdf6a4bffd`) Original commit changeset: e68d7d45f1c1 fbshipit-source-id: b89858b4e82f4f766031cfaad9fc736cf8097816	2021-05-21 19:24:17 -07:00
Xiaodong Wang	341f83d6a2	Revert D28474981: Create CUDA-aware futures in RequestCallback Test Plan: revert-hammer Differential Revision: D28474981 (`027c68ef00`) Original commit changeset: 492b8e71a43d fbshipit-source-id: 0697c0922cd6bcbea2505efeecbbcbb3ffcfff2b	2021-05-21 19:24:15 -07:00
Xiaodong Wang	7a8336a5a7	Revert D28474983: Set streams when invoking UDFs Test Plan: revert-hammer Differential Revision: D28474983 (`ab1e958d20`) Original commit changeset: 358292764d0a fbshipit-source-id: b4d4c25fe551d83848a9d023c139a9f1acc4c23d	2021-05-21 19:24:14 -07:00
Xiaodong Wang	89c81c5bba	Revert D28574083: Set and propagate devices in RRef completion future Test Plan: revert-hammer Differential Revision: D28574083 (`23df70359a`) Original commit changeset: 5c89902cdc5c fbshipit-source-id: e48043b6c4fb8a6f383f78e1aa88f7614f9fa13a	2021-05-21 19:24:12 -07:00
Xiaodong Wang	b8a04e25ec	Revert D28474982: Make TP agent use streams from Future when sending response Test Plan: revert-hammer Differential Revision: D28474982 (`19a7472702`) Original commit changeset: c0034eb3f2a2 fbshipit-source-id: fb260c71e6c9dd5a2c44121fe4729a4f4418532b	2021-05-21 19:23:01 -07:00
Meghan Lele	dceaf98e79	[torch][package] Fix importlib.resources.path for python <3.8.8 (#58718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58718 `PackageImporter` does not populate `module.__spec__.origin`, which causes an unhandled `Exception` to be raised when using `importlib.resources.path` to get a path to a binary file resource in the package in python <3.8.6. This commit fixes this issue by setting `module.__spec__.origin` to "<package_importer>". The actual value is not important as far as I can tell; the simple fact that it is not `None` allows `importlib` to avoid raising an `Exception` in `importlib.resources.path`. Test Plan: This commit adds a unit test to `test_resources.py` that tests that `importlib.resources.path` can be used within a package. Reviewed By: suo Differential Revision: D28589117 fbshipit-source-id: 870d606a30fce6884ae48b03ff71c0864e4b325f	2021-05-21 19:16:54 -07:00
Rohan Varma	071d49a970	Document monitored barrier (#58322 ) Summary: Will not land before the release, but would be good to have this function documented in master for its use in distributed debugability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58322 Reviewed By: SciPioneer Differential Revision: D28595405 Pulled By: rohan-varma fbshipit-source-id: fb00fa22fbe97a38c396eae98a904d1c4fb636fa	2021-05-21 19:04:57 -07:00
driazati	84b6c629d3	[lint] Move shellcheck to its own step (#58623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58623 This splits out everything shellcheck related into its own job that generates and checks GHA workflows, then shellchecks those + jenkins scripts. This PR also integrates shellcheck into the changed-only stuff in `actions_local_runner.py` so that shellcheck won't do anything unless someone edits a shell script in their local checkout. This is the final piece to clean up the output of `make quicklint` and speeds it up by a good bit (before it was shellchecking everything which took a few seconds): ``` $ make quicklint -j $(nproc) ✓ quick-checks: Ensure no unqualified noqa ✓ quick-checks: Ensure canonical include ✓ quick-checks: Ensure no unqualified type ignore ✓ quick-checks: Ensure no direct cub include ✓ quick-checks: Ensure no tabs ✓ quick-checks: Ensure no non-breaking spaces ✓ shellcheck: Regenerate workflows ✓ quick-checks: Ensure no versionless Python shebangs ✓ quick-checks: Ensure correct trailing newlines ✓ shellcheck: Assert that regenerating the workflows didn't change them ✓ mypy (skipped typestub generation) ✓ cmakelint: Run cmakelint ✓ quick-checks: Ensure no trailing spaces ✓ flake8 ✓ shellcheck: Extract scripts from GitHub Actions workflows ✓ shellcheck: Run Shellcheck real 0.92 user 6.12 sys 2.45 ``` Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D28617293 Pulled By: driazati fbshipit-source-id: af960ed441db797d07697bfb8292aff5010ca45b	2021-05-21 18:23:40 -07:00
Nikita Shulga	b842351b4f	Skip SVE acceleration on M1 (#58785 ) Summary: As it's not supported by the chip and also crashes compiler, see https://bugs.llvm.org/show_bug.cgi?id=50407 Fixes https://github.com/pytorch/pytorch/issues/58653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58785 Reviewed By: zhouzhuojie, driazati Differential Revision: D28619231 Pulled By: malfet fbshipit-source-id: 34367c074f9624b21d239eec757891cbb51f5bed	2021-05-21 18:08:30 -07:00
Yi Wang	3e88acbf05	[Gradient Compression] Do not skip the comm hook tests on Gloo backend (#58784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58784 DDP communication hooks are already supported on Gloo backend. No longer need to skip these tests on Gloo. Original PR issue: https://github.com/pytorch/pytorch/issues/58467 ghstack-source-id: 129635828 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_comm_hook_logging buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_allreduce buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_allreduce_process_group buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_powerSGD Reviewed By: rohan-varma Differential Revision: D28617214 fbshipit-source-id: 3bafb0c837a15ad203a8570f90750bc5177d5207	2021-05-21 17:47:52 -07:00
Nikita Shulga	041bff77b6	Make tools/actions_local_runner.py PY-3.X compatible (#58787 ) Summary: Do not use `shlex.join`, which is a simple join over quoted args, i.e. `a9e43615c2/Lib/shlex.py (L318-L320)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58787 Reviewed By: driazati Differential Revision: D28619996 Pulled By: malfet fbshipit-source-id: dd4e939a88e2923b41084da2b5fbdbee859c0104	2021-05-21 17:40:48 -07:00
Alexander Grund	829a096cd7	Fix arange functions for VSX specializations of Vec256 (#58553 ) Summary: Need a templated 2nd parameter to support e.g. double steps even for int vectors. This extends https://github.com/pytorch/pytorch/pull/34555 x86 specific fix to VSX instruction set. Fixes https://github.com/pytorch/pytorch/issues/58551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58553 Reviewed By: ailzhang Differential Revision: D28551266 Pulled By: malfet fbshipit-source-id: de7d23685da06b1b3089933d74398667cfb43c9f	2021-05-21 17:30:12 -07:00
Nikita Shulga	e094980060	Makefile should use python3 instead of python alias (#58786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58786 Reviewed By: driazati Differential Revision: D28619802 Pulled By: malfet fbshipit-source-id: 8f81298d39ba89c4e007f537ec2dd64bb23338af	2021-05-21 17:23:27 -07:00
Pritam Damania	1d885fbd0e	Update GraphTask::owner_ in a single thread for DistEngine. (#58625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58625 Several TSAN tests were failing for distributed since `owner_` was not atomic and was being accessed by several threads. As an example: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/autograd/engine/dist_engine.cpp#L333. To fix this, I've set the owner_ only once when the graphTask is created. Test Plan: 1) Validated change fixes failing TSAN test. 2) waitforbuildbot Reviewed By: albanD Differential Revision: D28496878 fbshipit-source-id: 473f4f6d859595749a02563a204ba7aa35ea19e3	2021-05-21 17:12:27 -07:00
Scott Wolchok	d9aa0b53eb	[PyTorch] Migrate TI usage in ATen/native/quantized to borrowing (#58307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58307 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129598791 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28445922 fbshipit-source-id: ce12743980296bab72a0cb83a8baff0bb6d80091	2021-05-21 16:31:01 -07:00
Scott Wolchok	3ddb4b3e68	[PyTorch] Migrate TI usage in ATen/native to borrowing (#58305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58305 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129598793 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28445712 fbshipit-source-id: 0822f1408a0a71c8f8934e6d90659ae3baa085ac	2021-05-21 16:29:50 -07:00
Angela Yi	e574c2c025	[quant][fx] Validate qconfig_dict keys (#58566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58566 Validates the keys of the qconfig_dict, prepare_custom_config_dict, convert_custom_config_dict, and fuse_custom_config_dict. If the user passes in an invalid key or makes a type, we will throw and error and let the user know what keys are supported. Test Plan: Imported from OSS python test/test_quantization.py Reviewed By: jerryzh168 Differential Revision: D28540923 fbshipit-source-id: 5958c32017b7d16abd219aefc8e92c42543897c2	2021-05-21 15:20:05 -07:00
Michael Suo	ed4cda0183	[pkg] opt into autoformat Summary: woooo Test Plan: arc lint --apply-patches --take BLACK --paths-cmd 'hg files -I "caffe2/*/.py"' Reviewed By: SplitInfinity Differential Revision: D28608934 fbshipit-source-id: 7768fed50a87883a95319376c0a6d73a9492bdcc	2021-05-21 15:03:52 -07:00
Rong Rong (AI Infra)	e5ba9307b7	catch exception when running print regression (#58751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58751 Test Plan: https://github.com/pytorch/pytorch/issues/58752 Reviewed By: samestep Differential Revision: D28605667 Pulled By: walterddr fbshipit-source-id: 3796c924df8e50849dd08ecbeab612ba4f0c569b	2021-05-21 14:59:42 -07:00
Lijiang Fang	378b2af93d	T90561249: Enforce kernel launch checks for OSS CI (#58465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58465 Test Plan: how to test? Reviewed By: r-barnes Differential Revision: D28500258 fbshipit-source-id: 19e56d5e18d77b951acb510e1e7ac834ce1ffc9b	2021-05-21 14:03:48 -07:00
Luca Wehrstedt	19a7472702	Make TP agent use streams from Future when sending response (#58428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58428 Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!). ghstack-source-id: 129567045 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474982 fbshipit-source-id: c0034eb3f2a2ea525efb63a31b839bc086060e7e	2021-05-21 13:15:35 -07:00
Luca Wehrstedt	23df70359a	Set and propagate devices in RRef completion future (#58674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58674 I found this missing parameter while debugging failures in the next PR. I'm very unhappy about this change. I think this future, which we know for sure won't contain tensors, shouldn't have to worry about CUDA devices. And yet, it does. This means that basically any future anywhere might have to worry about it, and this just doesn't scale, and thus it's bad. ghstack-source-id: 129567042 Test Plan: Should fix the next diff. Reviewed By: mrshenli Differential Revision: D28574083 fbshipit-source-id: 5c89902cdc5cc12f1ebeea860b90cd9c3d7c7da1	2021-05-21 13:15:34 -07:00
Luca Wehrstedt	ab1e958d20	Set streams when invoking UDFs (#58427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58427 Running the UDF (be it Python or JIT) is the first step of (most?) RPC calls, which is where the inputs are consumed. The lazy stream context contains the streams used by the inputs, thus it must be made current before any UDF call. I opt to do this as "close" as possible to the place the UDF is invoked, to make the relationship as explicit as possible. ghstack-source-id: 129567052 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474983 fbshipit-source-id: 358292764d0a6832081c34bf6736f0961475ff3d	2021-05-21 13:15:32 -07:00
Luca Wehrstedt	027c68ef00	Create CUDA-aware futures in RequestCallback (#58426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58426 The operations in RequestCallback can return CUDA tensors, thus the futures used to hold them must be CUDA-aware. ghstack-source-id: 129567051 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474981 fbshipit-source-id: 492b8e71a43da5f63b4b7a31f820427cde9736e4	2021-05-21 13:15:30 -07:00
Luca Wehrstedt	bdf6a4bffd	Provide pre-extracted DataPtrs when completing a Future with a Message (#58425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58425 Now that callbacks can provide pre-extracted DataPtrs, let's do so. This will become of crucial importance in the next PR, where some of these futures will become CUDA-aware, and thus they will try to extract DataPtrs on their own, but they would fail to do so here because Message isn't "inspectable". ghstack-source-id: 129567057 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474877 fbshipit-source-id: e68d7d45f1c1dc6daa5e05cf984cfc93d2dce0d0	2021-05-21 13:15:29 -07:00
Luca Wehrstedt	a0ee299d92	Allow Future::then to return pre-extracted DataPtrs (#58424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58424 In CUDA mode, Future must inspect its value and extract DataPtrs. However some types are not supported, for example the C++/JIT custom classes, which include Message, which is widely used in RPC. Hence for these scenarios we allow the user to perform the custom DataPtr extraction on their own, and pass the pre-extracted DataPtrs. Note that `markCompleted` already allowed users to pass in pre-extracted DataPtrs, hence this PR simply extends this possibility to the `then` method too. ghstack-source-id: 129567044 Test Plan: Used in next PR. Reviewed By: mrshenli Differential Revision: D28474880 fbshipit-source-id: 91a0dde5e29d1afac55650c5dfb306873188d785	2021-05-21 13:15:27 -07:00
Luca Wehrstedt	ebf55a7d13	Always use intrusive_ptr for Message (2 out of 2) (#58423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58423 This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging. ghstack-source-id: 129567049 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474879 fbshipit-source-id: 498652a8b80a953396cd5d4b275c0b2e869c9ecf	2021-05-21 13:15:25 -07:00
Luca Wehrstedt	4d704e607d	Always use intrusive_ptr for Message (1 out of 2) (#58422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58422 Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move). By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above. In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR. ghstack-source-id: 129567053 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474878 fbshipit-source-id: 5b76d45e05f6fa58c831e369c5c964d126187a6c	2021-05-21 13:15:24 -07:00
Luca Wehrstedt	35ea8779da	Prevent using anything other than intrusive_ptr for Future (#58421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58421 Here I make it impossible to create Futures that do not use intrusive_ptr, by making the constructor private. This makes it safer (by "forcing" people to do the right thing) and prevents a proliferation of new shared_ptrs or of accidental copies/moves. ghstack-source-id: 129567047 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474484 fbshipit-source-id: 82c487e1bb7c27a2e78cb5d594e00e54c752bf09	2021-05-21 13:15:22 -07:00
Luca Wehrstedt	44daf1930b	Migrate remaining shared_ptr<Future> to intrusive_ptr (#58420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58420 In https://github.com/pytorch/pytorch/pull/57636 I migrated most uses of Future to an intrusive_ptr. I thought I had all of them but I missed a couple. These are the remaining ones. (The next PR will make it impossible to add new usages of shared_ptr). ghstack-source-id: 129567071 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28477285 fbshipit-source-id: 75008276baa59e26b450e942c009ec7e78f89b13	2021-05-21 13:15:20 -07:00
Luca Wehrstedt	59454ce36e	Make remaining autograd methods return futures (#57861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57861 The very last methods left that still didn't return Futures were the autograd ones, but they're very easy to port. We've now finished the conversion of RequestCallback to be fully Future-based! ghstack-source-id: 129567055 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28286173 fbshipit-source-id: 1de58cee1b4513fb25b7e089eb9c45e2dda69fcb	2021-05-21 13:15:19 -07:00
Luca Wehrstedt	d6d2fb3323	Make remaining RRef methods return futures (#57860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57860 The other methods for RRefs just did bookkeeping and are trivially easy to migrate to Futures (which is done mainly for consistency at this point). ghstack-source-id: 129567068 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28286175 fbshipit-source-id: 1d97142803f73fe522435ca75200403c78babc68	2021-05-21 13:15:17 -07:00
Luca Wehrstedt	797dff55b5	Unify fetching RRefs (#57859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57859 Just like with assigning OwnerRRefs, we can also deduplicate the code paths for fetching their values. In fact this was duplicated three times, with different ways of post-processing the value (once for JIT, once for Python, once for autograd). Thanks to future, we can have that logic once, and then connect it to different follow-up steps. ghstack-source-id: 129567050 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28286172 fbshipit-source-id: e0742a99cf555755e848057ab6fee5285ff0df2a	2021-05-21 13:15:15 -07:00
Luca Wehrstedt	b9b41f6d1b	Deduplicate Python object serialization (#57858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57858 Just a small deduplication, which move complexity our of the way, and ensures consistent error checking. ghstack-source-id: 129567056 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28286174 fbshipit-source-id: 6eab8d3f30405d49c51f8b9220453df8773ff410	2021-05-21 13:15:14 -07:00
Luca Wehrstedt	cd9dbbd93a	Simplify process(Script\|Python)(Remote)?Call (#57857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57857 There used to be a whole lot of methods: `processPythonCall`, `processScriptCall`, `processScriptRemoteCall`, `processPythonRemoteCall`, `processScriptCallOp`, `processBaseScriptRemoteCall` and `processScriptRemoteCallOp`. Thanks to the previous simplification, we can now drop all but the first four, which map nicely 1:1 to the four message types we need to handle. Also their signatures become much simpler: they take an RPC command and return a future. ghstack-source-id: 129567070 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253848 fbshipit-source-id: e0e45345c414a96900f9d70ee555359d28908833	2021-05-21 13:15:12 -07:00
Luca Wehrstedt	c96a05d148	Unify assignment of OwnerRRef result (#57856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57856 Thanks to Futures providing a "common language" between various steps, we can now deduplicate the creation of OwnerRRef, by having two different ways of creating the result (JIT and Python) but then connecting them to a single method that wraps and stores that result in an OwnerRRef. ghstack-source-id: 129567072 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253845 fbshipit-source-id: a156e56cac60eb22f557c072b61ebac421cfad43	2021-05-21 13:15:10 -07:00
Luca Wehrstedt	e220a1bbcd	Make processPythonExecution return a future (#57855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57855 We already had a helper to run Python functions, which was nice (it de-duplicated some code). This helper was however taking a callback which, as I said, isn't as nice as it returning a Future. Hence here I change this. ghstack-source-id: 129567054 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253846 fbshipit-source-id: d854d4aa163798fb015cd6d46932f9ff1d18262e	2021-05-21 13:15:09 -07:00
Luca Wehrstedt	20d02cb7dd	Remove getScriptRemoteCallType (#57854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57854 Because OwnerRRefs used to be created before their value was computed, we had to figure out their type ahead of time. After the previous diff, we inverted the order of operations, and we can now first compute the result and then create the OwnerRRef. Which means we can just inspect the value to get its type. Much simpler, and much less likely to get it wrong. ghstack-source-id: 129567060 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253843 fbshipit-source-id: f13c9b294f477ae66fcbdbc85c642fdc69b2740f	2021-05-21 13:15:07 -07:00
Luca Wehrstedt	60fc37393e	Simplify OwnerRRef completion (#57853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57853 A bunch of methods received an OwnerRRef to "fill in". I think it will be more flexible to do it the other way around, and have these methods return a value (wrapped in a Future), which can then be "connected" to an OwnerRRef, but which can also potentially be consumed in different ways. ghstack-source-id: 129567059 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253844 fbshipit-source-id: 7e3772312dbacfc75a6ac0f62189fc9828001fc7	2021-05-21 13:15:05 -07:00
Luca Wehrstedt	ea2f5bbb4c	Unify async execution for JIT functions (#57852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57852 Another great example of the benefits of Futures. Thanks to the "right abstraction" (i.e., the `thenAsync` method), adding support for async execution becomes trivial, and the code much simpler than what it used to be. ghstack-source-id: 129567063 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253842 fbshipit-source-id: b660151ca300f3d6078db0f3e380c80a4d8f5190	2021-05-21 13:15:04 -07:00
Luca Wehrstedt	bfdc279134	Unify invoking JIT functions (#57851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57851 The same as the previous PR, but for JIT functions. ghstack-source-id: 129567069 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253841 fbshipit-source-id: 2b8affde16c106f5c76efa8be49af070213708bf	2021-05-21 13:15:02 -07:00
Luca Wehrstedt	77428159f5	Unify invoking JIT operands (#57850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57850 What I envision is a modular decomposed code, with separate steps which each consume/produce Futures, and which can be chained together to obtain the desired results. One common "starting point" for these chains is the execution of a remote function (Python or JIT or otherwise). I'm thus creating a helper function for one of these, the JIT operators (by deduplicating the places where we used to run them). More will follow. This deduplication will also help to add CUDA support to JIT RPC, since the execution of the JIT function/operators is where we need to set our custom streams. ghstack-source-id: 129567058 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28253847 fbshipit-source-id: 24ab67ad89c8796861e9bbcb78878b26704c0c48	2021-05-21 13:15:00 -07:00
Luca Wehrstedt	f94f1db938	Make some methods of RequestCallback return void instead of bool (#57849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57849 Some methods are currently returning bool, but I'll soon want them to return a Future. I could have them return a tuple of bool and Future, but that's a bit heavy. Instead it turns out we can very easily make them return void, which will simplify things. ghstack-source-id: 129567061 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28224476 fbshipit-source-id: 26dc796b7e38f03aa269cf0731b0059d58e57e94	2021-05-21 13:14:59 -07:00
Luca Wehrstedt	4ac18f6710	Centralize setting messageId in RequestCallback (#57848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57848 This PR looks large, but all it does is add a dozen lines and remove a lot of other ones. One first advantage of using Futures is that we can easily chain some "post-processing" to them. Until now we needed to pass the message ID around everywhere because it was set separately by each method. Instead, we could simply add a follow-up step to the final future which sets this ID, and remove all the former logic. ghstack-source-id: 129567065 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28224477 fbshipit-source-id: 7b6e21646262abe5bbbf268897e2d792e5accc27	2021-05-21 13:14:57 -07:00
Luca Wehrstedt	f6844eafce	Make RequestCallback collect Futures from methods, rather than providing them (#57847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57847 This is the first PR of a stack that aims to simplify RequestCallback, and I want to start by explaining my intentions. With the introduction of CUDA support in the TensorPipe agent, we found out that other layers higher up in the stack (RRefs, dist autograd, ...) were not "ready" to support CUDA. One cause of this was that in PyTorch most CUDA state is thread-local, and the RequestCallback class (and others) might execute different steps of an operation on multiple threads. The solution to this problem is to preserve or recreate the CUDA state when switching between threads (propagating streams, or recording events and then wait on them). If we were to manually do this everywhere it would be tedious, error-prone, and hard to maintain. In fact, we already have a primitive that can do this for us: CUDAFuture (now known as just Future). If whenever we switch threads we were to pack the values in a CUDAFuture and then unpack them on the other threads, all CUDA stuff would be taken care of for us. If our code leveraged CUDAFuture at its core, thing would become the "idiomatic" thing to do, the natural behavior. Future changes would thus also be inclined to follow this pattern, hence automatically doing the right thing. I also think that, even without these concerns about CUDA, there are benefits to use Futures more extensively. Currently RequestCallback uses a mix of Futures and callbacks. These are two tools for the same job, and thus mixing them creates confusion. Futures are more powerful than simple callbacks (they can be passed around, inspected, chained, waited on, ...) and thus should be preferred. They also lead to more readable code, as each step can be defined and chained in logical order, whereas callbacks must either be nested, or defined inline, or defined before and used later (thus making the code out-of-order). In short: I intend to rework RequestCallback to use Futures much more. I believe it will greatly simplify the code, help readability, and prove invaluable to support CUDA. --- Until now, we had the final result future being created at the very beginning, and then passed around everywhere, so that the various method could "fill in" its value. I think it's much lighter to instead allow each method to create or obtain its futures however it wants, and have it return them. I.e., have these futures "bubble up" from the lower layers, rather than them being "pushed down" from the upper ones. In this initial PR, I move the place where we create this "final result future", but I still keep it around. I will then, in later PRs, slowly migrate each method so that it returns a future, and in the end I will avoid creating the final result future. ghstack-source-id: 129567062 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28224478 fbshipit-source-id: dbdc66b6458645a4a164c02f00d8618fa64da028	2021-05-21 13:14:55 -07:00
Luca Wehrstedt	7e1f2b33ce	Add helpers to manipulate futures (#57846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57846 In later PRs I'll need to create already-completed futures (it'll make sense then, I hope). Here are a few helpers for that, which I'm adding separately to reduce the noise later. ghstack-source-id: 129567064 Test Plan: See later. Reviewed By: mrshenli Differential Revision: D28253664 fbshipit-source-id: f091e1d3ea353bb5bfbd2f582f1b8f84e4b0114f	2021-05-21 13:14:54 -07:00
Luca Wehrstedt	1d7cf4b248	Reduce overhead when Future invokes callbacks inline (#57638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57638 In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. ghstack-source-id: 129567067 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28222808 fbshipit-source-id: eb1c7114cf7aca3403cb708f14287cab0907ecfa	2021-05-21 13:14:52 -07:00
Luca Wehrstedt	ce2f1c29f9	Introduce thenAsync method on Future (#57637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57637 I had proposes a similar method in https://github.com/pytorch/pytorch/pull/48790, although that PR was exposing it to Python and thus requires a bit more work. This PR only introduces this method as a C++ API. Python can be added later. This new method is useful when one wants to use `then` but the callback does perform some async operation itself, and one wants to "reconcile" the future produced inside the callback with the one produced outside. ghstack-source-id: 129567066 Test Plan: Used (and thus tested) later in the stack. Reviewed By: mrshenli Differential Revision: D28222809 fbshipit-source-id: 869f11ab390b15e80c0855750e616f41248686c5	2021-05-21 13:13:02 -07:00
H1Gdev	d7d0fa2069	Fix typo. (#58728 ) Summary: Fix typo in docs and comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58728 Reviewed By: mruberry Differential Revision: D28603612 Pulled By: H-Huang fbshipit-source-id: b3cd8f6f19354201d597254d0b3cb4e2062471ab	2021-05-21 11:45:10 -07:00
Maxim Belkin	13c975684a	c10/util/thread_name.cpp: pthread_setname_np requires Glibc 2.12 (#55063 ) Summary: `pthread_setname_np` requires Glibc 2.12. The patch reproduces what numactl does: `93867c59b0/syscall.c (L132-L136)` Related to issue https://github.com/pytorch/pytorch/issues/23482 and the `pthread_setname_np.patch` patch that adamjstewart shared. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55063 Reviewed By: soulitzer Differential Revision: D28577146 Pulled By: malfet fbshipit-source-id: 85867b6f04795b1ae7bd46dbbc501cfd0ec9f163	2021-05-21 10:26:51 -07:00
Rohan Varma	76ce925257	[c10d] Fix monitored_barrier with wait_all_ranks (#58702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58702 Off by one error when determining if some ranks failed or not with `wait_all_ranks=True`. This wasn't caught by tests because the tests only tested failure scenarios, not success scenarios with `wait_all_ranks=True`. ghstack-source-id: 129559840 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28583235 fbshipit-source-id: a8f376efb13a3f36c788667acab86543c80aff59	2021-05-21 09:40:50 -07:00
Natalia Gimelshein	9e261de630	Revert D28547564: [pytorch][PR] masked_scatter thrust->cub Test Plan: revert-hammer Differential Revision: D28547564 (`5152cf8647`) Original commit changeset: 83aeddfaf702 fbshipit-source-id: d5259afb584e0f6c0a11de4d4cb3d56a2a562eb7	2021-05-21 09:18:34 -07:00
Elias Ellison	5313bafd31	[JIT] integer value refinement (#56438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56438 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27924239 Pulled By: eellison fbshipit-source-id: ace54fcb594853f30c242369ea203b0eb5527ac1	2021-05-21 08:51:01 -07:00
Elias Ellison	483ea176b3	Factor out isDominatedBy (#56437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56437 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27924240 Pulled By: eellison fbshipit-source-id: d600f895bfb06304957fe65155fceab0e5f873ea	2021-05-21 08:50:59 -07:00
Elias Ellison	0d9f1c1ec6	Add Value * == Value * peephole (#55978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55978 This is needed for broadcasting two of the same symbolic shape Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27755328 Pulled By: eellison fbshipit-source-id: d38d9458a9e28d31558f0bc55206516b78131032	2021-05-21 08:50:57 -07:00
Elias Ellison	391603d883	Factor out non tensor peephole (#55977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55977 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27755329 Pulled By: eellison fbshipit-source-id: 0e8948c0607fa59133310e4db8e05ac6847c9f8b	2021-05-21 08:50:55 -07:00
Elias Ellison	5cebf29b4e	Add list len refinement (#55926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55926 This is necessary for code like conv2d where we wish to share a generic convolution shape function logic with that of conv2d but for conv2d always infer the output is dimension 4. I'm also hoping the refinement algorithm here could be refactored out and used to support refining tensor types from user annotations. i have a length comment explaining how this works, and the logic outside of data structures is pretty small and contained. Additionally, you might check out https://fb.quip.com/X7EVAdQ99Zzm for a very similar description of how to refine values based on comparison operators. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27750997 Pulled By: eellison fbshipit-source-id: d962415af519ac37ebc9de88f2e1ea60a1374f7c	2021-05-21 08:50:54 -07:00
Elias Ellison	9fd2306036	Add handling of symbolic shapes (#55925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55925 This sets up the initial handling of symbolic shapes. As in the test, it doesn't work perfectly yet because it needs a couple other optimization passes. The basic description is pretty simple: we resolve tensor dimension indices to the same Value *, and before extracting out the output Tensor shape we substitute in symbolic shapes. We don't substitute during optimization because they are represented as negative numbers so we don't want them inadvertently used in Constant prop or something else. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27750996 Pulled By: eellison fbshipit-source-id: 6984e7276b578f96b00fc2025cef0e13f594b6e6	2021-05-21 08:50:52 -07:00
Elias Ellison	f39471a171	Initial Symbolic Shape Analysis (#54809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54809 I'm going to post on dev-discuss soon with a more thorough explanation of the design and advantages of this shape analysis, so I'm leaving out that for now. There is still a ton left to do, I'm posting this initial version so we can get something on master multiple can work on. List of many remaining steps to do: - [ ] Add symbolic shapes support - [ ] Bind shape functions for operators in C++ - [ ] Make classes of operators share the same shape function (e.g. pointwise, broadcast two inputs) - [ ] Refactor APIs - [ ] Only iteratively optimize shape function while a change has been made - [ ] Expand coverage of coverage to common ops - [ ] Add shape analysis pass on Graph that handles Ifs and Loops - [ ] Allow concurrent reads to the operator map - [ ] Successive applications of same inputs to same shape function (e.g. series of pointwise ops) For this review, I am mostly looking for comments related to the implementation of symolic_shape_analysis.cpp, with the caveats listed above. I am not really looking for comments related to api/registration/graph level analysis as those are all planned to be changed. I am fine landing this as is or waiting until necessary components of the TODOs above are finished. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27750998 Pulled By: eellison fbshipit-source-id: 4338b99e8651df076291c6b781c0e36a1bcbec03	2021-05-21 08:49:46 -07:00
Heitor Schueroff	72ae924fad	Added sublist support for torch.einsum (#56625 ) Summary: This PR adds an alternative way of calling `torch.einsum`. Instead of specifying the subscripts as letters in the `equation` parameter, one can now specify the subscripts as a list of integers as in `torch.einsum(operand1, subscripts1, operand2, subscripts2, ..., [subscripts_out])`. This would be equivalent to `torch.einsum('<subscripts1>,<subscripts2>,...,->[<subscript_out>]', operand1, operand2, ...)` TODO - [x] Update documentation - [x] Add more error checking - [x] Update tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/56625 Reviewed By: zou3519 Differential Revision: D28062616 Pulled By: heitorschueroff fbshipit-source-id: ec50ad34f127210696e7c545e4c0675166f127dc	2021-05-21 08:36:45 -07:00
Edward Yang	fc804b5def	Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles. Test Plan: revert-hammer Differential Revision: D28133579 (`034a238bab`) Original commit changeset: e7e30e961513 fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e	2021-05-21 08:18:40 -07:00
Horace He	e56d3b0238	Added OpInfo tests for NNC (#58719 ) Summary: Finds a couple of bugs: 1. permute needs to wrap dimensions 2. slice needs to wrap dimensions 3. frac doesn't work correctly for negative values 4. Permute has some other failures. This PR also fixes 1 + 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58719 Reviewed By: SplitInfinity Differential Revision: D28590457 Pulled By: Chillee fbshipit-source-id: a67fce67799602f9396bfeef615e652364918fbd	2021-05-21 01:41:28 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d88d321ee3	More robust slicing logic for nn.ModuleList (#58361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58361 Fixes: https://github.com/pytorch/pytorch/issues/16123 Test Plan: Imported from OSS Reviewed By: ppwwyyxx Differential Revision: D28464855 Pulled By: tugsbayasgalan fbshipit-source-id: db8c41b15dbe6550035e8230dea68ce60e5a6f9a	2021-05-20 23:00:17 -07:00
Rohan Varma	b301558410	[Reducer] Remove replica size == 1 checks (#58603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58603 No longer need these checks ghstack-source-id: 129498227 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28549893 fbshipit-source-id: a89bf8c3fc3aba311a70fd37e5a6aa5dc14b41b9	2021-05-20 22:34:23 -07:00
Rohan Varma	1d67c6d639	[DDP] Remove train call to module copies (#58595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58595 No longer needed since this list is always of size 1. ghstack-source-id: 129498229 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28548426 fbshipit-source-id: 7d6dba92fff685ec7f52ba7a3d350e36405e2578	2021-05-20 22:34:20 -07:00
Rohan Varma	88c76b43fb	[Reducer] move comment to the right place (#58594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58594 This comment was misplaced after some changes, move it to the right place. ghstack-source-id: 129498228 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D28548100 fbshipit-source-id: a9163fc3b25a9d9b8b6d4bfa2a77af290108fc09	2021-05-20 22:34:17 -07:00
Rohan Varma	d83c5a5c7f	Format reducer.cpp, hpp (#58593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58593 Per title ghstack-source-id: 129498230 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28528465 fbshipit-source-id: 89e4bfcb4a0275dc17090a934d4c0a41a3c54046	2021-05-20 22:32:30 -07:00
Jordan Fix	6d97a80dd2	[fx][graph_drawer] Improve graph drawer coloring and tensor_meta handling (#58699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58699 Make `call_function`/`call_method` random colors based on their target name. This coloring is stable according to the name of the target. Also handle tensor_meta more elegantly for quantized types, including print q_scale/q_zero_point if they're used. Test Plan: Tested locally Reviewed By: chenccfb, 842974287 Differential Revision: D28580333 fbshipit-source-id: ad9961e1106a1bfa5a018d009b0ddb8802d2163c	2021-05-20 21:26:04 -07:00
Andres Suarez	5455df2b99	[codemod][dirsync] Apply clang-format Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D28477071 fbshipit-source-id: e844e0fad2f5599fd27e0fd113a328031cb63aa7	2021-05-20 21:23:24 -07:00
Horace He	21a9334034	Revert D28497967: [quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer Test Plan: revert-hammer Differential Revision: D28497967 (`1cf8f7a439`) Original commit changeset: 421ce3d86fad fbshipit-source-id: b1b290be47d847ab0e0128e3ae89f528578550ee	2021-05-20 20:56:12 -07:00
Jerry Zhang	1cf8f7a439	[quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer (#58455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58455 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28497967 fbshipit-source-id: 421ce3d86fadd3d92f4120b850b0167270509189	2021-05-20 20:34:47 -07:00
Rohan Varma	62adf9e1c9	[Reducer] Completely remove VariableIndex (#58592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58592 Completely removes VariableIndex from reducer code, as it is not needed. replica_index is always 0 so simplify the code to only use the parameter index. Next, we should also remove all of the nested data structures that were needed when num_replicas > 1 was possible. ghstack-source-id: 129498226 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28528440 fbshipit-source-id: e0568399264ab4f86de3b7a379a4f0831f8f42e9	2021-05-20 19:47:50 -07:00
Dhruv Matani	8e4fc0063a	[Try] [PyTorch Edge] Trim unused code related to CUDA and HIP Interfaces (#58689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58689 This doesn't seem to be mobile related, but ends up getting called from multiple places, so is hard to get rid of entirely. ghstack-source-id: 129413850 Test Plan: Build Reviewed By: iseeyuan Differential Revision: D28543374 fbshipit-source-id: 867b3e2fafdcbf6030d7029a82a2b711bcecefc5	2021-05-20 18:23:36 -07:00
Edward Yang	773cfae93b	Tag PyObject on TensorImpl per torchdeploy interpreter (#57985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57985 Fixes https://github.com/pytorch/pytorch/issues/57756 This PR introduces a new `pyobj_interpreter_` field on TensorImpl which tracks what Python interpreter (if any) owns the TensorImpl. This makes it illegal to bind a TensorImpl from multiple Python interpreters, and means that we can now directly store PyObject pointer on TensorImpl even in the presence of multiple Python interpreters, as is the case in torchdeploy. This is a necessary step for PyObject preservation, which cannot be easily implemented when there are multiple Python interpreters. Although the PR is not that long, there is a very subtle portion of the implementation devoted to ensuring that the tagging process is thread safe, since multiple threads can concurrently try to tag a PyObject. Check Note [Python interpreter tag] and Note [Memory ordering on Python interpreter tag] for detailed discussion of how this is handled. You will have to check this code carefully in code review; I did not torture test the multithreaded paths in any meaningful way. In a follow up PR, I will pack the interpreter and PyObject fields into single atomic word on 64-bit. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D28390242 Pulled By: ezyang fbshipit-source-id: a6d9b244ee6b9c7209e1ed185e336297848e3017	2021-05-20 18:18:39 -07:00
Kurt Mohler	fe8e5eb260	Change native functions to take `c10::string_view` args instead of `std::string` (#57680 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57680 Reviewed By: malfet Differential Revision: D28511799 Pulled By: ezyang fbshipit-source-id: 43142f994d048b28b3279ccdb7a28cbaa3190973	2021-05-20 18:15:45 -07:00
Michael Voznesensky	d1d24304ee	[Caffe2] [Easy] Fix comment on caffe2_serialize_using_bytes_as_holder to reflect correct types Summary: the logic is: ``` template <typename T> typename std::enable_if< std::is_same<T, bool>::value \|\| std::is_same<T, uint8_t>::value \|\| std::is_same<T, int8_t>::value \|\| std::is_same<T, uint16_t>::value \|\| std::is_same<T, int16_t>::value, void>::type ``` Test Plan: N/A Reviewed By: simpkins Differential Revision: D28587311 fbshipit-source-id: 970c673a9c1256600ec8bdd5f9ca53333a60d588	2021-05-20 18:03:34 -07:00
Jacob Szwejbka	db67699ae6	[Pytorch Edge] NAME -> SCHEMA (#58604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58604 Minor bug fix. Schemas should be defined with the schema macro not the name one. Test Plan: ci and buck test fbsource//xplat/pytorch_models/build/cair_messaging_2021_05_17/v2:cair_messaging_2021_05_17_test Reviewed By: dhruvbird, iseeyuan Differential Revision: D28549578 fbshipit-source-id: 0c64eb8c60f1aee8213a1fc1fb7231226b905795	2021-05-20 17:51:38 -07:00
leslie-fang-intel	0ede83db7a	enable torch.cpu.amp.autocast (#57386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57386 Here is the PR for what's discussed in the RFC https://github.com/pytorch/pytorch/issues/55374 to enable the autocast for CPU device. Currently, this PR only enable BF16 as the lower precision datatype. Changes: 1. Enable new API `torch.cpu.amp.autocast` for autocast on CPU device: include the python API, C++ API, new Dispatchkey etc. 2. Consolidate the implementation for each cast policy sharing between CPU and GPU devices. 3. Add the operation lists to corresponding cast policy for cpu autocast. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572219 Pulled By: ezyang fbshipit-source-id: db3db509973b16a5728ee510b5e1ee716b03a152	2021-05-20 17:48:36 -07:00
Jerry Zhang	b6dcdeacc9	[quant][graphmode][fx] Move qat_swap_modules outside of Quantizer (#58454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58454 Trying to remove Quantizer in the end Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28497966 fbshipit-source-id: 800f8e4afd99918d7330345f8ae7bcf018a5bde7	2021-05-20 17:27:49 -07:00
Scott Wolchok	fdc5dfdd50	[PyTorch] Migrate TI usage in ATen/native/cpu to borrowing (#58303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58303 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129471191 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28444032 fbshipit-source-id: f6a9e9effb43c273f464ef6ff410274962f3ab23	2021-05-20 17:24:13 -07:00
Scott Wolchok	7c15d3206d	[PyTorch] Add TI::borrowing_nullary_op and use it (#58280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58280 All core PyTorch uses of TensorIterator::nullary_op look like they can safely borrow. ghstack-source-id: 129471193 Test Plan: Existing CI Reviewed By: bhosmer Differential Revision: D28429695 fbshipit-source-id: 404cf6db31e45e5cf7ae6d2f113c5a8eff6f7c3d	2021-05-20 17:22:58 -07:00
johnlu	618be18a41	Enable the quantization on XPU devices (#54857 ) Summary: Enable the quantization on XPU devices. Keep the model as is if the model is on XPU devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54857 Reviewed By: ailzhang Differential Revision: D28501381 Pulled By: jerryzh168 fbshipit-source-id: 6d3e9b04075393248b30776c69881f957a1a837c	2021-05-20 17:02:13 -07:00
Masaki Kozuki	ce3788d6a5	Add `#pragma once` to CUDA foreach headers (#58209 ) Summary: Per the title, adding `#pragma once` to cuda headers related to foreach functions. cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/58209 Reviewed By: ailzhang Differential Revision: D28558620 Pulled By: ngimel fbshipit-source-id: 195f68435999eb7409ba904daf6fc5f0962d375d	2021-05-20 16:35:43 -07:00
Jerry Zhang	f879e70fc1	[quant][fx][graphmode][refactor] Factor out generate_qconfig_map to qconfig_utils.py (#58453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58453 Move the class method generate_qconfig_map to qconfig_utils, will add more PRs to remove functions out of Quantizer and eventually remove the Quantizer object Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28497965 fbshipit-source-id: 3c78cfe676965d20a8834a859ffed4d8e9ecade4	2021-05-20 16:26:24 -07:00
Ansha Yu	bf1c936e06	[static runtime] out variant for full_like (#58079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58079 Support full_like Test Plan: `buck test mode/dev caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_FullLike` Test on regenerated local inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/dec_6x/266377643_shrunk.predictor.disagg.local.regenerated.pt --pt_inputs=/data/users/ansha/tmp/adfinder/dec_6x/local_inputs --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=0 --do_benchmark=1 --adsfinder_compatibility=1 --v=1 ``` `V0511 10:59:57.187054 1911683 impl.cpp:1229] Switch to out variant for node: %5571 : Tensor = aten::full_like(%blob_for_shape.1, %235, %654, %75, %75, %75, %75)` Reviewed By: hlu1 Differential Revision: D28361997 fbshipit-source-id: 89c41e37ce23d6008cfe4d80536832ee76d3405e	2021-05-20 16:17:40 -07:00
Edvard Ghazaryan	5211eeb22b	Support aten::leaky_relu for TE (#58464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58464 Test Plan: ./bin/test_tensorexpr python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops Reviewed By: Krovatkin Differential Revision: D28499776 fbshipit-source-id: 20094a1bc78aa485f76aec4e065ff69e43d692d7	2021-05-20 16:12:03 -07:00
Jerry Zhang	4668d09ca6	[quant][graphmode][fx] Quantize the output of statically quantized fp16 op in QuantizeHandler (#58445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58445 Previously the output of statically quantized fp16 operator is not quantized in QuantizeHandler, which is not consistent with the behavior of static int8 operators. Also it does not work well with reference functions, this PR changes the fp16 static QuantizeHandler to quantize (call to(torch.float16)) in the QuantizeHandler, this also makes the future support for reference functions easier. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28495830 fbshipit-source-id: 2140eab8ab2dd08f6570d9e305485e3029e1f47d	2021-05-20 16:03:42 -07:00
H1Gdev	6edd49a8e8	[Android]Removed dependency with AppCompat. (#58527 ) Summary: I build using [Bazel](https://bazel.build/). When I use `pytorch_android` in latest Android app, I get the following error due to dependencies: ``` $ bazel build //app/src/main:app WARNING: API level 30 specified by android_ndk_repository 'androidndk' is not available. Using latest known API level 29 INFO: Analyzed target //app/src/main:app (0 packages loaded, 0 targets configured). INFO: Found 1 target... ERROR: /home/H1Gdev/android-bazel-app/app/src/main/BUILD.bazel:3:15: Merging manifest for //app/src/main:app failed: (Exit 1): ResourceProcessorBusyBox failed: error executing command bazel-out/k8-opt-exec-2B5CBBC6/bin/external/bazel_tools/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox --tool MERGE_MANIFEST -- --manifest ... (remaining 11 argument(s) skipped) Use --sandbox_debug to see verbose messages from the sandbox ResourceProcessorBusyBox failed: error executing command bazel-out/k8-opt-exec-2B5CBBC6/bin/external/bazel_tools/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox --tool MERGE_MANIFEST -- --manifest ... (remaining 11 argument(s) skipped) Use --sandbox_debug to see verbose messages from the sandbox Error: /home/H1Gdev/.cache/bazel/_bazel_H1Gdev/29e18157a4334967491de4cc9a879dc0/sandbox/linux-sandbox/914/execroot/__main__/app/src/main/AndroidManifest.xml:19:18-86 Error: Attribute application@appComponentFactory value=(androidx.core.app.CoreComponentFactory) from [maven//:androidx_core_core] AndroidManifest.xml:19:18-86 is also present at [maven//:com_android_support_support_compat] AndroidManifest.xml:19:18-91 value=(android.support.v4.app.CoreComponentFactory). Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:5:5-19:19 to override. May 19, 2021 10:45:03 AM com.google.devtools.build.android.ManifestMergerAction main SEVERE: Error during merging manifests com.google.devtools.build.android.AndroidManifestProcessor$ManifestProcessingException: Manifest merger failed : Attribute application@appComponentFactory value=(androidx.core.app.CoreComponentFactory) from [maven//:androidx_core_core] AndroidManifest.xml:19:18-86 is also present at [maven//:com_android_support_support_compat] AndroidManifest.xml:19:18-91 value=(android.support.v4.app.CoreComponentFactory). Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:5:5-19:19 to override. at com.google.devtools.build.android.AndroidManifestProcessor.mergeManifest(AndroidManifestProcessor.java:186) at com.google.devtools.build.android.ManifestMergerAction.main(ManifestMergerAction.java:217) at com.google.devtools.build.android.ResourceProcessorBusyBox$Tool$5.call(ResourceProcessorBusyBox.java:93) at com.google.devtools.build.android.ResourceProcessorBusyBox.processRequest(ResourceProcessorBusyBox.java:233) at com.google.devtools.build.android.ResourceProcessorBusyBox.main(ResourceProcessorBusyBox.java:177) Warning: See http://g.co/androidstudio/manifest-merger for more information about the manifest merger. Target //app/src/main:app failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 2.221s, Critical Path: 1.79s INFO: 2 processes: 2 internal. FAILED: Build did NOT complete successfully ``` This is due to conflict between `AndroidX` and `Support Library` on which `pytorch_android_torch` depends. (In the case of `Gradle`, it is avoided by `android.useAndroidX`.) I created [Android application](https://github.com/H1Gdev/android-bazel-app) for comparison. At first, I updated `AppCompat` from `Support Library` to `AndroidX`, but `pytorch_android` and `pytorch_android_torchvision` didn't seem to need any dependencies, so I removed dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58527 Reviewed By: xta0 Differential Revision: D28585234 Pulled By: IvanKobzarev fbshipit-source-id: 78aa6b1525543594ae951a6234dd88a3fdbfc062	2021-05-20 15:49:19 -07:00
Nikita Shulga	d84121421e	[third-party] Update nccl to 2.9.8 (#58667 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58667 Reviewed By: ngimel Differential Revision: D28577042 Pulled By: malfet fbshipit-source-id: 62f1c67f35bf5a004852806c1a74bb068cefb79b	2021-05-20 15:42:17 -07:00
Kyle Vedder	bbf92e6176	Add missing .to_sparse(ndim) gradient (#58413 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46720, extends PR https://github.com/pytorch/pytorch/issues/46825 by adding test requested in [this comment](https://github.com/pytorch/pytorch/pull/46825#issuecomment-842304079). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58413 Reviewed By: ailzhang Differential Revision: D28540550 Pulled By: albanD fbshipit-source-id: d7e292e09b5402336c43844ee233b83b0a095035	2021-05-20 15:08:34 -07:00
Winston Smith	8a3d9962e0	Enable `ceil`, `floor`, `frac`, `round` & `trunc` for BFloat16 on CUDA (#57910 ) Summary: Enable `ceil`, `floor`, `frac`, `round` & `trunc` for BFloat16 on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/57910 Reviewed By: soulitzer Differential Revision: D28579486 Pulled By: ngimel fbshipit-source-id: 2f90354339dbccb69cea7ec9caf9b066ea13a666	2021-05-20 14:52:45 -07:00
Zhengxu Chen	034a238bab	[jit] Implement ScriptProfile to collect instruction profiles. (#57397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397 Introduces two main classes in C++ runtime: ScriptProfile is the implementation for enalbing and disabling interpreter profiling in C++. This should be only used from Python, and we will add corresponding Python API in the next diff. InstructionSpan is a utility class to instrument execution of each single instruction. A start timestamp is recorded in the consturctor, and an end timestamp is recorded in the destructor. During destruction, this will send runtime data to all enabled ScriptProfile instances. Test Plan: build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic' Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28133579 fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b	2021-05-20 14:11:03 -07:00
Michael Carilli	e8c6a65074	Adds grid_sampler to autocast fp32 list for 1.9 (#58679 ) Summary: Temporary fix for https://github.com/pytorch/pytorch/issues/42218. Numerically, grid_sampler should be fine in fp32 or fp16. So grid_sampler really belongs on the promote list. But performancewise, native grid_sampler backward kernels use gpuAtomicAdd, which is notoriously slow in fp16. So the simplest functionality fix is to put grid_sampler on the fp32 list. In https://github.com/pytorch/pytorch/pull/58618 I implement the right long-term fix (refactoring kernels to use fp16-friendly fastAtomicAdd and moving grid_sampler to the promote list). But that's more invasive, and for 1.9 ngimel says this simple temporary fix is preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58679 Reviewed By: soulitzer Differential Revision: D28576559 Pulled By: ngimel fbshipit-source-id: d653003f37eaedcbb3eaac8d7fec26c343acbc07	2021-05-20 14:05:09 -07:00
Xiao Wang	691c139144	Do not use TF32 matmul in linalg and DDP tests (#56114 ) Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See https://github.com/pytorch/pytorch/issues/52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See https://github.com/pytorch/pytorch/issues/50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf	2021-05-20 14:01:19 -07:00
Edvard Ghazaryan	a7f06e1e55	Added statistic related to out variant nodes Summary: added more statistic info for static runtime Test Plan: caffe2/benchmarks/static_runtime:static_runtime_cpptest Expected output example: Static runtime ms per iter: 0.939483. Iters per second: 1064.41 Node #0: 0.195671 ms/iter, %wide_offset.1 : Tensor = aten::add(%wide.1, %self._mu, %4) Node #1: 0.169457 ms/iter, %wide_normalized.1 : Tensor = aten::mul(%wide_offset.1, %self._sigma) Node #2: 0.118218 ms/iter, %wide_preproc.1 : Tensor = aten::clamp(%wide_normalized.1, %5, %6) Node #3: 0.038814 ms/iter, %user_emb_t.1 : Tensor = aten::transpose(%user_emb.1, %4, %7) Node #4: 0.0860747 ms/iter, %dp_unflatten.1 : Tensor = aten::bmm(%ad_emb_packed.1, %user_emb_t.1) Node #5: 0.0102666 ms/iter, %31 : Tensor = static_runtime::flatten_copy(%dp_unflatten.1, %4, %8) Node #6: 0.000476333 ms/iter, %19 : Tensor[] = prim::ListConstruct(%31, %wide_preproc.1) Node #7: 0.0707332 ms/iter, %input.1 : Tensor = aten::cat(%19, %4) Node #8: 0.123695 ms/iter, %fc1.1 : Tensor = aten::addmm(%self._fc_b, %input.1, %29, %4, %4) Node #9: 0.0309244 ms/iter, %23 : Tensor = aten::sigmoid(%fc1.1) Node #10: 0.0046297 ms/iter, %24 : (Tensor) = prim::TupleConstruct(%23) Time per node type: 0.195671 ms. 23.0483%. aten::add (1 nodes) 0.169457 ms. 19.9605%. aten::mul (1 nodes, out variant) 0.123695 ms. 14.5702%. aten::addmm (1 nodes, out variant) 0.118218 ms. 13.925%. aten::clamp (1 nodes, out variant) 0.0860747 ms. 10.1388%. aten::bmm (1 nodes, out variant) 0.0707332 ms. 8.33175%. aten::cat (1 nodes, out variant) 0.038814 ms. 4.57195%. aten::transpose (1 nodes) 0.0309244 ms. 3.64263%. aten::sigmoid (1 nodes, out variant) 0.0102666 ms. 1.20932%. static_runtime::flatten_copy (1 nodes, out variant) 0.0046297 ms. 0.545338%. prim::TupleConstruct (1 nodes, out variant) 0.000476333 ms. 0.0561079%. prim::ListConstruct (1 nodes, out variant) 0.848959 ms. in Total StaticRuntime setup time: 0.018925 ms Memory allocation time: 0.019808 ms Memory deallocation time: 0.0120445 ms Outputs deallocation time: 0.0864947 ms Total memory managed: 19328 bytes Total number of reused tensors: 3 Total number of 'out' variant nodes/total number of nodes: 9/11 (81.8182%) Reviewed By: hlu1 Differential Revision: D28553029 fbshipit-source-id: 55e7eab50b4b475ae219896100bdf4f6678875a4	2021-05-20 13:57:07 -07:00
Natalia Gimelshein	056287aec4	turn off deadline for adagrad test Summary: Tests are frequently failing with "exceeded the deadline of 1000.00ms", we expect this to happen, so remove the deadline Test Plan: N/A: Fix breakages Reviewed By: robieta Differential Revision: D28581051 fbshipit-source-id: 4825ada9af151fa5d57c45c549138c15ba613705	2021-05-20 13:47:02 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9db64e6e56	Revert "Striding for lists Part 2 (#49352 )" (#58523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58523 This reverts commit fee7e8b91d4434b976a339330bfa89bd827ab9ec. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28528023 Pulled By: tugsbayasgalan fbshipit-source-id: 9fa1d86f0c81fcc6fd3798e0d51a712a3c9b3952	2021-05-20 13:20:33 -07:00
Heitor Schueroff	9123229684	Cleanup functional.py after lu_unpack was removed (#58669 ) Summary: Remove code in functional.py that became unused after PR `c790fd2bf8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58669 Reviewed By: driazati Differential Revision: D28572377 Pulled By: heitorschueroff fbshipit-source-id: c90d80ead5f3d69100667488bc6b14ef54b95b54	2021-05-20 13:06:30 -07:00
Bert Maher	0e1bed364d	[nnc] Use int64 to compute matmul flops heuristic (#58676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58676 We only generate asm for small matmuls, but we were computing the # of flops using an int32, which is too small. Test Plan: ``` buck test mode/dev //caffe2/test:static_runtime -- --exact 'caffe2/test:static_runtime - test_mlp (test_static_runtime.TestStaticModule)' ``` Reviewed By: navahgar Differential Revision: D28562157 fbshipit-source-id: a07ceba5209ef6022ead09140380c116994755cf	2021-05-20 13:05:21 -07:00
Horace He	a60ce98a2e	Remove opinfo warning from floor_divide (#58682 ) Summary: This warning makes downstream users of OpInfo error when they use this opinfo, unless they actually run the operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58682 Reviewed By: mruberry Differential Revision: D28577334 Pulled By: Chillee fbshipit-source-id: f10e64f8ad3fb50907531d8cb89ce5b0d06ac076	2021-05-20 12:57:58 -07:00
Hao Lu	1981904c8d	[Static Runtime] Check input container type in aten::__getitem__ (#58639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58639 Fix two tests in `//caffe2/test:static_runtime` that were previously broken. Reviewed By: ajyu, edvgha Differential Revision: D28561185 fbshipit-source-id: 3cfb0960666c808523d65da267f70bd51e828313	2021-05-20 12:47:01 -07:00
Eli Uriegas	84500d03d2	.github: Upload /download large artifacts to s3 (#58506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58506 We were experiencing 500 errors when it came to downloading large artifacts so let's just use s3 for those larger artifacts just in case Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D28520792 Pulled By: seemethere fbshipit-source-id: 3aa15c4872fe46c9491ac31dc969bf71175378aa	2021-05-20 11:52:05 -07:00
Thomas J. Fan	151ec56311	ENH Adds check for input sizes in cosine_similarity (#58559 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55273 Adds check for input sizes to be consistent with the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58559 Reviewed By: soulitzer Differential Revision: D28562376 Pulled By: ailzhang fbshipit-source-id: f292e8a26f11a40d146fbed94a28025794808216	2021-05-20 11:40:06 -07:00
Will Constable	3c55db8065	Add Deploy to PredictorContainer (#58503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58503 add gflags to force using deploy for torchscript models Test Plan: Add parametrization to PredictorContainer test to exercise gflag override and test deploy codepath. Add test case to exercise new torch.package codepath. Reviewed By: suo Differential Revision: D28246793 fbshipit-source-id: 88a2c8322c89284e3c8e14fee5f20e9d8a4ef300	2021-05-20 11:29:31 -07:00
abladawood	1fc3e1e1fb	Abladawood patch 1 (#58496 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58496 Reviewed By: soulitzer Differential Revision: D28562333 Pulled By: ailzhang fbshipit-source-id: aa9fcc03ba7ffe03db6cc5da353d37d679a0a160	2021-05-20 10:32:18 -07:00
Xiang Gao	5152cf8647	masked_scatter thrust->cub (#56750 ) Summary: Benchmark: ```python import torch import itertools def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() run50_sync(lambda: torch.randperm(1000000, device='cuda')) def benchmark(M): a = torch.randn(M, device='cuda') m = torch.randint(1, (M,), dtype=torch.long, device='cuda').bool() v = torch.randn(M, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda:a.masked_scatter_(m, v)) for M in (100, 1000, 100000, 10000000): print(M) benchmark(M) ``` Before: ``` 100 8.65 ms ± 80.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1000 8.75 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 9.27 ms ± 87.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 33.6 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` After ``` 100 8.04 ms ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1000 8.09 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 8.63 ms ± 76.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 31.9 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56750 Reviewed By: ailzhang Differential Revision: D28547564 Pulled By: ngimel fbshipit-source-id: 83aeddfaf7023f9f9501c6b1e2faf91e8b6277b1	2021-05-20 10:27:58 -07:00
Marcio Porto	4942fe0290	[DataLoader] Introduce MapMapDataPipe functional datapipe (#58258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58258 As part of https://github.com/pytorch/pytorch/issues/57031, this PR adds the `MapMapDataPipe` functional datapipe for the `MapDataPipe` class. Usage: ``` def fn(x): return x * 10 dp = CountingDataset(n=10) dp.map(fn) ``` Reviewed By: ejguan Differential Revision: D28394510 fbshipit-source-id: 8d71b1f5723dff52385c3ce753944304896af678	2021-05-20 09:00:21 -07:00
Rohan Varma	faa7d3793d	[DDP] Support not all outputs used in loss calculation (#57081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57081 Changes in this diff: Enable passthrough autograd function when find_unused_parameters=True. With above, move prepare_for_backward which does unused parameter checking logic to beginning of backwards pass, only when find_unused_parameters=True. Enhance process of unused parameter checking to account for outputs not being used in loss. The way (3) is implemented is by triggering the autograd hook corresponding to parameters that did not participate in loss computation. Since they did not participate, the autograd hook is triggered with a gradient of None, and the reducer handles this appropriately to ensure that the gradient is not touched. Tested by ensuring that when a model output is not used in loss, the corresponding grad is not modified. Also verified that the grads are the same in local vs DDP training case. Also verified that gradients are not touched in this case, i.e. if grad is originally None, it stays as None, not zero, after. Note that in this diff we are not enabling the pass through autograd function for regular case find_unused_parameters=False because that has a much bigger blast radius and needs additional careful analysis especially with regard to the performance. ghstack-source-id: 129425139 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28048628 fbshipit-source-id: 71d7b6af8626804710017a4edd753787aa9bba61	2021-05-20 08:34:33 -07:00
Nikita Shulga	abb215e229	Fix dtype inference in sparse_csr_tensor_ctor (#58631 ) Summary: `NULL` return from `PyObject_GetAttrString` should never get ignored without handling the exception, as behavior of subsequent Python C API calls are undefined until `PyErr_Fetch` or `PyErr_Clear` is called. This accidentally leads to `list` type being incorrectly identified as `Tensor` Fixes https://github.com/pytorch/pytorch/issues/58520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58631 Reviewed By: albanD Differential Revision: D28559454 Pulled By: malfet fbshipit-source-id: 46f044b5f0f94264779a6108474d04a8ba851c53	2021-05-20 08:02:05 -07:00
Heitor Schueroff	9ac0bd23a2	Fix bug in test_fx_experimental codegen (#58587 ) Summary: This PR fixes a bug in test_fx_experimental where code generated for ops with kwarg-only Tensor parameters would fail to execute because they would be called as positional parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58587 Reviewed By: ailzhang Differential Revision: D28548365 Pulled By: heitorschueroff fbshipit-source-id: 8f1746053cbad1b11e817b0099db545d8dd22232	2021-05-20 07:49:08 -07:00
Nathan John Sircombe	bf00d26deb	Enables builds with Compute Library backend for oneDNN (#55913 ) Summary: Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library for the Arm architeture to provide optimised convolution primitives on AArch64. This change enables the use of Compute Library in the PyTorch build. Following the approach used to enable the use of CBLAS in MKLDNN, It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL. The location of the Compute Library build must be set useing `ACL_ROOT_DIR`. This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400 which added support for the oneDNN/MKL-DNN backend on AArch64. _Note: this assumes that Compute Library has been built and installed at ACL_ROOT_DIR. Compute library can be downloaded here: `https://github.com/ARM-software/ComputeLibrary`_ Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913 Reviewed By: ailzhang Differential Revision: D28559516 Pulled By: malfet fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005	2021-05-20 07:43:56 -07:00
Thomas J. Fan	145a6f7985	DOC Adds code comment to clarify nn.Linear.reset_parameters (#58487 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57109 Adds comment to clarify `a=sqrt(5)` in `nn.Linear.reset_parameters`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58487 Reviewed By: ailzhang Differential Revision: D28548391 Pulled By: jbschlosser fbshipit-source-id: 2d5910b2576a04f19edbd8b8515cdb55fc249ce5	2021-05-20 06:15:47 -07:00
Michael Suo	5caccbe39e	[pkg] Catch exceptions where dependency resolution gets invalid imports (#58573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58573 Users can create invalid imports, like: ``` HG: in a top-level package if False: from .. import foo ``` Since this code is never executed, it will not cause the module to fail to load. But our dependency analysis walks every `import` statement in the AST, and will attempt to resolve the (incorrectly formed) import, throwing an exception. For posterity, the code that triggered this: https://git.io/JsCgM Differential Revision: D28543980 Test Plan: Added a unit test Reviewed By: Chillee Pulled By: suo fbshipit-source-id: 03b7e274633945b186500fab6f974973ef8c7c7d	2021-05-19 23:04:21 -07:00
Michael Suo	703f24397b	[pkg] simplifications to broken dependency handling (#58572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58572 Right now, we have three categories of error (broken, denied, unhandled). This PR unifies them into a single "error" field in the node, with optional context. It also generalizes how formatting of the error in PackagingError occurs. Differential Revision: D28543982 Test Plan: sandcastle Reviewed By: Chillee Pulled By: suo fbshipit-source-id: d99d37699ec2e172e3798763e60aafe9a66ed6f4	2021-05-19 23:03:12 -07:00
Nikita Shulga	c4f0c5ee50	Quote in setup-ci-env (#58637 ) Summary: Do not put quotes for arguments that do not have space in them in add_to_env_file ENV file is used both by bash as well as by docker, which does not omit quotes when they are present there Pull Request resolved: https://github.com/pytorch/pytorch/pull/58637 Reviewed By: wconstab Differential Revision: D28561159 Pulled By: malfet fbshipit-source-id: 0843aad22703b6c3adebeb76175de1cfc1a974b5	2021-05-19 22:20:13 -07:00
Ailing Zhang	8615fd65e3	Fix GIL issue when acquiring multiple sessions. (#58584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58584 Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy Reviewed By: wconstab Differential Revision: D28545314 fbshipit-source-id: 45cb0e4d80d4766ec1aed6a51679af3424cb0878	2021-05-19 22:05:52 -07:00
Will Constable	24786bd6ef	Make torch::deploy work with or without cuda (#58493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58493 In fbcode, we want torch::deploy to be a target that works with or without cuda, depending only on whether cuda is linked in the final binary. To enable this, we build both flavors of libinterpreter, and choose which to load at runtime depending on whether cuda is available in the application. This comes at a cost to binary size, as it includes two copies of libinterpreter instead of one. However, it does not require _loading_ two copies of libinterpreter into memory at runtime, so the memory footprint of the interpreter (which we make N copies of) is not impacted. In oss/cmake, this change is a no-op. cuda is already handled there by building just one libinterpreter, but building cuda or not for the whole pytorch build based on a global cmake flag. Test Plan: test in fbcode with new gpu mode unit tests, verify existing oss CI passes Reviewed By: suo Differential Revision: D28512178 fbshipit-source-id: 61354bf78b1932605a841388fcbc4bafc0c4bbb4	2021-05-19 21:44:23 -07:00
Eddie Yan	fbc235c226	port `sgn` to structured (#58197 ) Summary: https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58197 Reviewed By: ejguan Differential Revision: D28416538 Pulled By: ezyang fbshipit-source-id: bd78172ff4b11bfc69304c426d5817a47bcbb567	2021-05-19 20:10:01 -07:00
Xue Haotian	b5e39bceec	Port fmax & fmin to structured kernel (#58458 ) Summary: Port fmax & fmin to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58458 Reviewed By: ailzhang Differential Revision: D28509263 Pulled By: ezyang fbshipit-source-id: 3fccb46746e5c0695fe8fa498ce32f8ab4609f04	2021-05-19 20:06:06 -07:00
Luoshang Pan	e179a56839	[FX Splitter] dump final graph and print operator stats via to_glow API Summary: - dump final graph in glow - print operator stats via to_glow API - 1) node stats for final glow graph - 2) operator stats in TorchGlowBackend for torch::jit::graph to lower Reviewed By: khabinov Differential Revision: D28444501 fbshipit-source-id: 743755c320071edc4c045ad004adeb16b4a9c323	2021-05-19 19:16:19 -07:00
Ilqar Ramazanli	9a622f4cd9	refactor ASGD to use functional API (#58410 ) Summary: Functional API is used in large scale distributed training to enable multithreaded training instead of multiprocess, as it gives more optimal resource utilization and efficiency. In this PR, we provide code migration and refactoring for functional API for ASGD algorithm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58410 Reviewed By: ailzhang Differential Revision: D28546702 Pulled By: iramazanli fbshipit-source-id: 4f62b6037d53f35b19f98340e88af2ebb6243a4f	2021-05-19 18:55:52 -07:00
Basil Hosmer	208b36f109	remove redundant getDispatchKeySetUnboxed(eligibleKeys) (#58535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58535 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28531377 Pulled By: bhosmer fbshipit-source-id: ade1427c8c9ada10ecdc69ef80c5d90be23f5787	2021-05-19 17:08:03 -07:00
Winston Smith	47c566ebb1	Rename namespace `vec256` to `vec`, struct `Vec256` to `Vectorized` (and other related classes/structs) (#58438 ) Summary: In order to make it more convenient for maintainers to review the ATen AVX512 implementation, the namespace `vec256` is being renamed to `vec` in this PR, as modifying 77 files & creating 2 new files only took a few minutes, as these changes aren't significant, so fewer files would've to be reviewed while reviewing https://github.com/pytorch/pytorch/issues/56992. The struct `Vec256` is not being renamed to `Vec`, but `Vectorized` instead, because there are some `using Vec=` statements in the codebase, so renaming it to `Vectorized` was more convenient. However, I can still rename it to `Vec`, if required. ### Changes made in this PR - Created `aten/src/ATen/cpu/vec` with subdirectory `vec256` (vec512 would be added via https://github.com/pytorch/pytorch/issues/56992). The changes were made in this manner - 1. First, a script was run to rename `vec256` to `vec` & `Vec` to `Vectorized` - ``` # Ref: https://stackoverflow.com/a/20721292 cd aten/src grep -rli 'vec256\/vec256\.h' * \| xargs -i@ sed -i 's/vec256\/vec256\.h/vec\/vec\.h/g' @ grep -rli 'vec256\/functional\.h' * \| xargs -i@ sed -i 's/vec256\/functional\.h/vec\/functional\.h/g' @ grep -rli 'vec256\/intrinsics\.h' * \| xargs -i@ sed -i 's/vec256\/intrinsics\.h/vec\/vec256\/intrinsics\.h/g' @ grep -rli 'namespace vec256' * \| xargs -i@ sed -i 's/namespace vec256/namespace vec/g' @ grep -rli 'Vec256' * \| xargs -i@ sed -i 's/Vec256/Vectorized/g' @ grep -rli 'vec256\:\:' * \| xargs -i@ sed -i 's/vec256\:\:/vec\:\:/g' @ grep -rli 'at\:\:vec256' * \| xargs -i@ sed -i 's/at\:\:vec256/at\:\:vec/g' @ cd ATen/cpu mkdir vec mv vec256 vec cd vec/vec256 grep -rli 'cpu\/vec256\/' * \| xargs -i@ sed -i 's/cpu\/vec256\//cpu\/vec\/vec256\//g' @ grep -rli 'vec\/vec\.h' * \| xargs -i@ sed -i 's/vec\/vec\.h/vec\/vec256\.h/g' @ ``` 2. `vec256` & `VEC256` were replaced with `vec` & `VEC` respectively in 4 CMake files. 3. In `pytorch_vec/aten/src/ATen/test/`, `vec256_test_all_types.h` & `vec256_test_all_types.cpp` were renamed. 4. `pytorch_vec/aten/src/ATen/cpu/vec/vec.h` & `pytorch_vec/aten/src/ATen/cpu/vec/functional.h` were created. Both currently have one line each & would have 5 when AVX512 support would be added for ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58438 Reviewed By: malfet Differential Revision: D28509615 Pulled By: ezyang fbshipit-source-id: 63840df5f23b3b59e203d25816e2977c6a901780	2021-05-19 16:04:36 -07:00
Bert Maher	a6b358d53b	Revert D28461013: [nnc] Enable CPU fusion inside Facebook, take 2 Test Plan: revert-hammer Differential Revision: D28461013 (`c76405d3b1`) Original commit changeset: 79a80b6ffb65 fbshipit-source-id: d9cc5c512542153f39664635fb080d797a9de7d0	2021-05-19 15:27:38 -07:00
James Reed	36adc3f04d	[FX] Add APIs to mutate specific args/kwargs (#58571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58571 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D28543359 Pulled By: jamesr66a fbshipit-source-id: 44812d04886e653b5439c880dd831ecbc893fe23	2021-05-19 14:54:16 -07:00
Nikita Shulga	296d2a4399	[THC] Rename THCTensorMathMagma from cu to cpp (#58521 ) Summary: This supposed to be a no-op (as .cu file do not contain any cuda code), that reduces compilation time 2.5x: ``` $ time /usr/local/cuda/bin/nvcc /home/nshulga/git/pytorch/aten/src/THC/THCTensorMathMagma.cu -c ... real 0m7.701s $ time /usr/local/cuda/bin/nvcc /home/nshulga/git/pytorch/aten/src/THC/THCTensorMathMagma.cpp -c ... real 0m2.657s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58521 Reviewed By: ngimel Differential Revision: D28526946 Pulled By: malfet fbshipit-source-id: ed42a9db3349654b75dcf63605bb4256154f01ff	2021-05-19 14:26:21 -07:00
Mustafa Bal	ae99640a78	Added publishing of test results and minor fixes to Az DevOps Build Logic (#58436 ) Summary: This PR adds the ability to publish the xml test data of custom PyTorch PR tests. This PR also adds a few fixes to the custom PyTorch PR tests logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58436 Reviewed By: seemethere, mruberry Differential Revision: D28512958 Pulled By: malfet fbshipit-source-id: d3a1a251d3d126c923d5f733dccfb31a4b701b7e	2021-05-19 14:17:48 -07:00
Ching-Hsiang Chu	b9b8522e00	[profile] fix recorded data type (#58531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58531 fix data type of alltoall(v) when recording communication metadata via DebugInfo in NCCL PG Reviewed By: chaekit Differential Revision: D28529372 fbshipit-source-id: 2917653f73f5fe4f6dc901803235994ca042bba2	2021-05-19 14:14:54 -07:00
Chester Liu	8de8b492f7	Revert "Move Azure MultiGPU tests back to nightly (#58242 )" (#58451 ) Summary: This reverts commit 2afcb7e8fde0476db2e32feae9a80e36f23c1b19. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58451 Reviewed By: ailzhang Differential Revision: D28497920 Pulled By: malfet fbshipit-source-id: 7e9e4f1e3e6e46d8d2a4cba2e6147e0b50d27f6d	2021-05-19 13:55:26 -07:00
Akifumi Imanishi	3113a1de4a	Fix some tensor operators to return `NotImplemented` for invalid inputs (#58216 ) Summary: Same as https://github.com/pytorch/pytorch/issues/57934. (cc/ albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58216 Reviewed By: ailzhang Differential Revision: D28494886 Pulled By: albanD fbshipit-source-id: 380205867ee1cde90e1c6fcfe2a31749e1243530	2021-05-19 13:09:57 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
Hao Lu	954d39ba38	[ATen][Quant] Pass at::Tensor by reference (#58284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58284 - Passing at::Tensor by value can incur a lot of refcount bumps overhead. Passing-by-reference is much more efficient. - Use Tensor::expect_contiguous() where possible to remove refcount bump overhead when input tensor is already contiguous. Reviewed By: supriyar, swolchok Differential Revision: D28432300 fbshipit-source-id: 089ceed08f0d54f109e441f8a1314d726e8481ce	2021-05-19 12:36:50 -07:00
David Reiss	a91375432a	model_dump: Accept variable-length debug info (#57660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57660 Ignore trailing elements so we're compatible with both old and new models. Test Plan: Dumped and old model. Unit test. Reviewed By: malfet Differential Revision: D28531391 Pulled By: dreiss fbshipit-source-id: 197a55ab0e6a7d8e25cbee83852e194afacc988e	2021-05-19 12:25:27 -07:00
David Reiss	ab1fdbefe1	model_dump: Use DumpUnpickler.load instead of .dump (#57659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57659 Faster since we don't do an automatic pprint, and shorter, simpler code. Test Plan: Dumped some models. Reviewed By: malfet Differential Revision: D28531398 Pulled By: dreiss fbshipit-source-id: 47f1f646d4576af9f7e680933e0512f616dab5c0	2021-05-19 12:25:25 -07:00
David Reiss	53078924ad	model_dump: Add a section that summarizes tensor memory usage (#57658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57658 Since there is no Python change here and we only do the analysis when rendering the open section, this should have no impact on page size or load time! (Well, a constant impact on page size due to the added code.) Before I made it lazy, I observed that it increased load time by over 100ms for a large model. Test Plan: Dumped a CUDA model and saw the size summary. Reviewed By: malfet Differential Revision: D28531394 Pulled By: dreiss fbshipit-source-id: f77012b7bab069de861a4ba23486c665e1306aa0	2021-05-19 12:25:23 -07:00
David Reiss	ef4e6036bc	model_dump: Handle dict rendering (#57657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57657 Test Plan: Clicked around a model with some dicts in it. Reviewed By: malfet Differential Revision: D28531397 Pulled By: dreiss fbshipit-source-id: 069690f147e91eadd76fec5f5ca4eec057abcb98	2021-05-19 12:25:21 -07:00
David Reiss	72ff3163bd	model_dump: Handle torch.device objects (#57656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57656 This came up when dumping a CUDA model. Test Plan: Dumped a CUDA model. Reviewed By: malfet Differential Revision: D28531396 Pulled By: dreiss fbshipit-source-id: fe0e94248c8085a8b760d253ba0b517f153b3442	2021-05-19 12:25:19 -07:00
David Reiss	a380575f5b	model_dump: Refactor renderTensor into a helper method (#57655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57655 Now lots of code is shared between tensor and qtensor rendering. Net lines of code is actually +1, but it should result in a savings if/when we implement some of those todos. Test Plan: Clicked around in Chrome. Reviewed By: malfet Differential Revision: D28531395 Pulled By: dreiss fbshipit-source-id: 190a04ed587b54d27f3410246763cd636c0634be	2021-05-19 12:25:17 -07:00
David Reiss	3ff76af23c	model_dump: Implement "Hider" properly (#57654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57654 I learned how to use children in React/Preact. :) Now it's not necessary to give every hidable section its own id and synchonize the "shown=false" with "style='display:none;'". This also means that the hidden elements aren't rendered to the DOM unless the hider is open. Test Plan: Clicked around in Chrome. Reviewed By: malfet Differential Revision: D28531393 Pulled By: dreiss fbshipit-source-id: bc86c823ae4b7e80c000f50c5429d89dff6ae64d	2021-05-19 12:23:59 -07:00
Natalia Gimelshein	3f0b081636	move code to Blas.cpp, clean up THC magma (#58526 ) Summary: To improve compilation times Pull Request resolved: https://github.com/pytorch/pytorch/pull/58526 Reviewed By: malfet Differential Revision: D28540035 Pulled By: ngimel fbshipit-source-id: 01a6b1e2b12aa246c5ecfa810ad4e87bde040553	2021-05-19 12:04:18 -07:00
Gary Miguel	703cfdc9ed	[JIT] improve documentation (#57991 ) Summary: * Fix lots of links. * Minor improvements for consistency, clarity or grammar. * Update jit_python_reference to note the limitations on __exit__. (Related to https://github.com/pytorch/pytorch/issues/41420). * Fix a comment in exit_transforms.cpp: removed the word "not" which made the comment say the opposite of the truth. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57991 Reviewed By: malfet Differential Revision: D28522247 Pulled By: SplitInfinity fbshipit-source-id: fc63a59d19ea6c89f957c9f7d451be17d1c5fc91	2021-05-19 11:47:32 -07:00
Horace He	79a258f448	s/foward/forward/g (#58497 ) Summary: Annoying typo. Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497 Reviewed By: malfet Differential Revision: D28521081 Pulled By: Chillee fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10	2021-05-19 11:42:42 -07:00
Edvard Ghazaryan	ccad77aa22	Added OperatorMap for mapping Operator to any template <T> (#58060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58060 Generic way to check if Operator belongs to predefined map, and if so via public method(s) access to map value. In general value can be anything for example Operator's schema. Test Plan: buck test caffe2/test/cpp/jit:jit -- OperatorMap Reviewed By: Krovatkin Differential Revision: D28357933 fbshipit-source-id: ba3248cf06c07f16aebafccb7ae71c1245afb083	2021-05-19 11:38:49 -07:00
Rohan Varma	1ba05efd26	[Reducer] Remove some unused variables (#58524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58524 Per title ghstack-source-id: 129311600 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28528223 fbshipit-source-id: 239a15de4b602e35ed9b15b8a4bea3c28b61de12	2021-05-19 09:55:04 -07:00
Eli Uriegas	4cf9b11022	Fix issues regarding binary_checkout (#58558 ) Summary: Cherry-pick of https://github.com/pytorch/pytorch/issues/58495 back to master Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes https://github.com/pytorch/pytorch/issues/58557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58558 Reviewed By: albanD Differential Revision: D28538867 Pulled By: malfet fbshipit-source-id: 3517d8729df7c0c0a221d26f6966c8dcef2f3076	2021-05-19 08:24:34 -07:00
Nikita Shulga	baf05c3f5e	Split CUDA SpectralOp (#58459 ) Summary: Move all cuFFT related parts to SpectralOps.cpp Leave only _fft_fill_with_conjugate_symmetry_cuda_ in SpecralOps.cu Keep `CUDAHooks.cpp` in torch_cuda_cpp by introducing `at::cuda::detail::THCMagma_init` functor and registering it from global constructor in `THCTensorMathMagma.cu` Move entire detail folder to torch_cuda_cpp library. This is a no-op that helps greatly reduce binary size for CUDA-11.x builds by avoiding cufft/cudnn symbol duplication between torch_cuda_cpp(that makes most of cuFFT calls) and torch_cuda_cu (that only needed it to compile SpectralOps.cu) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58459 Reviewed By: ngimel Differential Revision: D28499001 Pulled By: malfet fbshipit-source-id: 425a981beb383c18a79d4fbd9b49ddb4e5133291	2021-05-19 07:59:03 -07:00
Elton Leander Pinto	029bec4505	[lint] Fix uninitialized variable lint error in `Module.cpp` (#58499 ) Summary: This PR fixes two uninitialized variable lint warnings in `Module.cpp` by initializing them to `nullptr`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58499 Reviewed By: driazati, samestep Differential Revision: D28519192 Pulled By: 1ntEgr8 fbshipit-source-id: 293cd4b296eea70b72adf02cd73f354063b124c6	2021-05-19 07:55:24 -07:00
Facebook Community Bot	b45a105acb	Automated submodule update: tensorpipe (#58477 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `a0c6aa1422` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58477 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D28506522 fbshipit-source-id: 2da92feae212a568cfe441d33e4966ffe6c182e5	2021-05-19 05:49:29 -07:00
Hao Lu	4d7abdbdad	[Quant] Add out variant for int8 quantized::linear (#58282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58282 Reviewed By: ajyu Differential Revision: D28428734 fbshipit-source-id: f25243cdbc220e59659605a3a29e2b161dd7c1f2	2021-05-19 00:24:23 -07:00
Bert Maher	c76405d3b1	[nnc] Enable CPU fusion inside Facebook, take 2 (#58347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58347 Back out "Revert D27652484 (`ac04cc775b`): [nnc] Enable CPU fusion inside Facebook" Original commit changeset: ecfef3ee1e71 ghstack-source-id: 129279584 Test Plan: Tests for bugfix included in this stack Reviewed By: navahgar Differential Revision: D28461013 fbshipit-source-id: 79a80b6ffb653ab952ff5efaa143d3362bb7d966	2021-05-18 21:45:48 -07:00
Bert Maher	dcfc2050bd	VaryingShape<Strides>::isComplete() needs to consider whether each Stride is complete (#58510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58510 In some case that I don't fully understand we're getting a stride that is: ``` {2:1, 1:1, 0:*} ``` (in this debug output, M:N means stride index M, stride value N). This shape should be considered incomplete, since we don't actually know the values of the stride, but VaryingShape::isComplete considers it complete because it only checks the presence of elements in the vector, not whether those elements are themselves complete. ghstack-source-id: 129279583 Test Plan: new unit test in test/cpp/jit To see the failure in the context of a real model: ``` ./fblearner/predictor/loadgen/download-requests.sh 272478342_0 10 ~/local/requests/272478342_0.recordio buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio ``` Reviewed By: Krovatkin Differential Revision: D28520062 fbshipit-source-id: 3ca900337d86480a40fbd90349a698cbb2fa5f11	2021-05-18 21:45:46 -07:00
Bert Maher	3d20ddfe92	[nnc] Do not fuse unsqueeze with variable dim (#58346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58346 If `dim` is a variable, NNC doesn't know how to translate the result, since the shape is unknown. This issue manifested as a `bad_variant_access` when we try to pull an int constant out of that arg. Note that, while the PE will pick up the resultant shape, it won't set guards accordingly. ghstack-source-id: 129078971 Test Plan: new fuser test Reviewed By: navahgar Differential Revision: D28460956 fbshipit-source-id: 57ef918ef309ee57bfdf86717b910b6549750454	2021-05-18 21:44:37 -07:00
Bert Maher	2ddd841635	[nnc] Make the pretty printer prettier (#57874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57874 Before: ``` { for (int v = 0; v < 100; v++) { aten_sin[v] = sin(x_1[v]); } { sum = float(0); for (int v_1 = 0; v_1 < 100; v_1++) { sum = ReduceOp((sum) + float(aten_sin[v_1]), reduce_args={}); } } for (int v_2 = 0; v_2 < 100; v_2++) { aten_cos[v_2] = cos(x_1[v_2]); } for (int v_3 = 0; v_3 < 100; v_3++) { aten_mul[v_3] = (_tensor_constant0[v_3]) * (aten_cos[v_3]); } } ``` After: ``` { for (int v = 0; v < 100; v++) { aten_sin[v] = sin(x_1[v]); } { sum = float(0); for (int v_1 = 0; v_1 < 100; v_1++) { sum = ReduceOp((sum) + float(aten_sin[v_1]), reduce_args={}); } } for (int v_2 = 0; v_2 < 100; v_2++) { aten_cos[v_2] = cos(x_1[v_2]); } for (int v_3 = 0; v_3 < 100; v_3++) { aten_mul[v_3] = (_tensor_constant0[v_3]) * (aten_cos[v_3]); } } ``` Test Plan: Imported from OSS Reviewed By: navahgar, malfet Differential Revision: D28455842 Pulled By: bertmaher fbshipit-source-id: 6d5ca9be12afd66a9ba32c129a3f4d618247cd35	2021-05-18 18:26:58 -07:00
Zhengxu Chen	3a3959d253	[jit] Add a utility class SourceRef to represent Source as keys (#57396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57396 A new type SourceRef is introduced to represent a unique identifier to source text. The type holds refcount to underlying source, and supports comparators and hash functions, such that it can be used in C++ and Python maps. In later diffs we will use this to aggregate and print profiling information. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D28133578 fbshipit-source-id: c3d5199a8269c5006c85a145b281bcaaf3e2dc1c	2021-05-18 18:20:53 -07:00
Nikita Shulga	0362b753db	[BE] Use __func__ as checkAllSameGPU() 1st arg (#58502 ) Summary: Hardcoded names often get out of date, for example in AdaptiveAverafePooling those names contained cudnn_ prefix Pull Request resolved: https://github.com/pytorch/pytorch/pull/58502 Reviewed By: samestep Differential Revision: D28518917 Pulled By: malfet fbshipit-source-id: 9b16adae85a179e335da4facb4e769b9f67824bc	2021-05-18 16:45:54 -07:00
Yanli Zhao	ea0f7c4720	move unused parameters to end of bucket orders when rebuild buckets for static graph (#58097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58097 move unused parameters to end of bucket orders when rebuild buckets for static graph Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D28366689 fbshipit-source-id: fbd224aeb761d5aa3bab35a00d64974eb4455b2e	2021-05-18 16:36:40 -07:00
Chen Lai	a7b62abeb0	[PyTorch Edge] bytecode version bump to v5 and enable share constant table (#57888 ) Summary: As title, main change: 1. Enable share constant table and reduce model size up to 50% 2. Bump bytecode version from v4 to v5. 3. Add the unittest back. (It was partially removed because `script_module_v5.ptl` bytecode version is v5. When current runtime is v4 and try to load a v5 model, it will raise an error because version is not within the range. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57888 As title ghstack-source-id: 129255867 Test Plan: CI ``` buck test papaya/toolkit/frontend/torch/... buck test mode/opt papaya/integration/service/test/smartkeyboard:smartkeyboard_system_test ``` Reviewed By: raziel, iseeyuan Differential Revision: D28309381 fbshipit-source-id: 6f5cf4296eaadde913d55f27d5bfb9d1dea2fbaf	2021-05-18 16:17:13 -07:00
Bert Maher	9eee782cb6	[nnc][scripts] Add a script for bisecting the TE fuser pass (#58357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58357 Finding a miscompilation in a large program can be tedious; this script automates the process of bisecting based on the number of fused instructions. Since fusing aten::cat without the corresponding prim::ListConstruct will cause an assertion failure, we treat that case as a "skip" and ignore it for the purpose of bisection. ghstack-source-id: 129079484 Test Plan: Tried it on some failing testcases, plus I wrote a simple bash script to simulate "failure" and "skip" and verified a few different cases. Reviewed By: huiguoo Differential Revision: D28463808 fbshipit-source-id: 64836f1d37a573549179410316ea7168e3dc1f23	2021-05-18 16:10:20 -07:00
Nikolay Korovaiko	7d78d72d7b	removing old comment (#56430 ) Summary: Removing a comment which is no longer relevant after https://github.com/pytorch/pytorch/pull/56089 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56430 Reviewed By: desertfire Differential Revision: D28515547 Pulled By: Krovatkin fbshipit-source-id: c4e62741a872fef015248cd7ab1b3213d35109ee	2021-05-18 14:56:22 -07:00
Sam Estep	a07cd22efb	Comment why render_test_results is its own step (#58505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58505 Reviewed By: seemethere Differential Revision: D28520332 Pulled By: samestep fbshipit-source-id: 6637b58b399caf6019d6fd8bfab21646cbd219b6	2021-05-18 14:40:32 -07:00
Venkata Chintapalli	8efaab1b83	Add long tensor type to AddFakeFp16 Op (#58504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58504 .. to support QRT inline_CVR models to avoid failure ``` [DataPreproc] User preprocessing error: c10::Error: [enforce fail at operator.h:1307] . Unsupported type of tensor: long (Error from operator: input: "sparse_nn_2/HistogramBinningCalibrationByFeature_2/cast_22/cast_22_5" input: "sparse_nn_2/HistogramBinningCalibrationByFeature_2/mul_5/Mul" output: "sparse_nn_2/HistogramBinningCalibrationByFeature_2/add_7/Add_2" name: "" type: "AddFakeFp16" arg { name: "broadcast" i: 1 } device_option { extra_info: "inference_split:force_merge" extra_info: "inference_split:force_merge" }) ``` f273407515 Test Plan: f273692411 Reviewed By: hx89 Differential Revision: D28513550 fbshipit-source-id: 86892e1a98b5219cd187731018ce2692b231fb58	2021-05-18 14:25:56 -07:00
Raghavan Raman	4b859cbca1	[NNC] Do not optimize conditionals when the corresponding loop is not normalized (#57675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57675 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231375 Pulled By: navahgar fbshipit-source-id: bcbcebca25577744c7190a0aa9fa376f76dea77d	2021-05-18 14:25:53 -07:00
Raghavan Raman	a71b99b50d	[NNC] Add a method to check if a loop is normalized (#57674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57674 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231377 Pulled By: navahgar fbshipit-source-id: 3d92d532f1e1f78c9d94619980340622b73f99ec	2021-05-18 14:25:50 -07:00
Raghavan Raman	3fe72d30dc	[NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231374 Pulled By: navahgar fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a	2021-05-18 14:23:48 -07:00
Kimish Patel	db42ec4297	[Pytorch Sparsity] Add sparse sources to build target Summary: This adds to internal build target and makes it ready for selective build workflow. Test Plan: CI builds Reviewed By: z-a-f Differential Revision: D28103697 fbshipit-source-id: 19c8b27aae4de1cece8d88d13ea51ca4ac7d79b6	2021-05-18 14:19:14 -07:00
Edvard Ghazaryan	ad97fd8031	Support symbolic diff for leaky_relu (#58337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58337 supports symbolic differentiation for leaky_relu Test Plan: test/test_jit.py test/test_ops.py Reviewed By: Krovatkin Differential Revision: D28458898 fbshipit-source-id: bdde74d689d2c2ea1f59507456c2efa4e38de1cc	2021-05-18 14:13:40 -07:00
Sam Estep	e1551f1678	Clarify .github/scripts/generate_ci_workflows.py (#58498 ) Summary: Followup to https://github.com/pytorch/pytorch/issues/58491: - use f-string to remove the literal `generated` string from the generator script, so Phabricator no longer thinks it is a generated file - remove the special logic for `test_runner_type` and instead explicitly specify for every workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/58498 Test Plan: ``` make generate-gha-workflows ``` Also, check that Phabricator doesn't classify `.github/scripts/generate_ci_workflows.py` as "Generated changes" in this diff. Reviewed By: seemethere Differential Revision: D28516291 Pulled By: samestep fbshipit-source-id: 8736eaad5d28082490be0a9b2e271c9493c2ba9d	2021-05-18 12:50:00 -07:00
Scott Wolchok	5fcf49f596	[PyTorch] Add a guard rail to TensorIterator::add_borrowed_{in,out}put (#58279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58279 See comment in source code. ghstack-source-id: 129002040 Test Plan: CI Reviewed By: wenleix Differential Revision: D28428962 fbshipit-source-id: e011819e5579396f3ca2d87978c84965260adb1b	2021-05-18 12:46:33 -07:00
Scott Wolchok	03f2f0f88f	[PyTorch] Migrate remaining CUDA TI usage to borrowing where possible (#58278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58278 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129002042 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28428809 fbshipit-source-id: 23ccf508c4413371a88085271f11c7d0cc861a9e	2021-05-18 12:46:32 -07:00
Scott Wolchok	1fd256dc3b	[PyTorch] Migrate CUDA indexing TI usage to borrowing (#58277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58277 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129002044 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D28428441 fbshipit-source-id: 243b746aeb5fdf8b95c8e591c066c5eab140deb6	2021-05-18 12:46:30 -07:00
Scott Wolchok	029289bd6c	[PyTorch] Migrate TensorAdvancedIndexing TI usage to borrowing where possible (#58276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58276 Borrowing is more efficient, and we can see in all these cases that the TensorIterator doesn't outlive the input & output Tensors. ghstack-source-id: 129002045 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D28428234 fbshipit-source-id: 9eada7725a070799b55e6683509e359505a2b80a	2021-05-18 12:46:28 -07:00
Scott Wolchok	439ba27dea	[PyTorch] Migrate all extant uses of build_binary_float_op to build_borrowing_binary_float_op (#58273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58273 Borrowing is more efficient, and structured kernels can always borrow. ghstack-source-id: 129002041 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28427914 fbshipit-source-id: eed27a10603b412af5357d3554477ba407abba73	2021-05-18 12:46:26 -07:00
Scott Wolchok	8a4a511ff5	[PyTorch] Migrate all extant uses of build_binary_op to build_borrowing_binary_op (#58272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58272 Borrowing is more efficient, and structured kernels can always borrow. ghstack-source-id: 129002046 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D28427768 fbshipit-source-id: 6314a682556c6914c843aaacf2d75b2adb164e9a	2021-05-18 12:44:50 -07:00
Emily Shen	07da584dbd	Fix KeyError returned by _maybe_get_last_node_only_observer (#58443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58443 Test Plan: arc lint Reviewed By: vkuzo Differential Revision: D28494119 fbshipit-source-id: 05abf4e12051afc237096812fb0ee08a8b9447f9	2021-05-18 12:41:19 -07:00
Sam Estep	46484e8dfe	Simplify .github/scripts/generate_ci_workflows.py (#58491 ) Summary: This PR simplifies `.github/scripts/generate_ci_workflows.py` by using the same strategy as https://github.com/pytorch/pytorch/issues/54344, representing workflows as plain data to avoid duplicating the definition of the `generate_workflow_file` function. This will make the script easier to maintain if/when that function is modified and/or more workflow types are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58491 Test Plan: The Lint job in CI; specifically: ``` make generate-gha-workflows mypy --config mypy-strict.ini ``` Reviewed By: malfet, seemethere Differential Revision: D28511918 Pulled By: samestep fbshipit-source-id: aaf415a954d938a29aee7c9367c9bc2b9f44bb01	2021-05-18 11:49:51 -07:00
Nikita Shulga	f7c15610aa	Collect kernel version (#58485 ) Summary: Collect env should collect kernel and glibc version Fixes https://github.com/pytorch/pytorch/issues/58387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485 Reviewed By: walterddr Differential Revision: D28510564 Pulled By: malfet fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b	2021-05-18 10:57:59 -07:00
Natalia Gimelshein	92e36240f5	fix nonzero perf regression (#58468 ) Summary: https://github.com/pytorch/pytorch/issues/55292 introduced perf regression for nonzero cuda, this fixes it. nvcc is still pretty bad about unrolling loops with boundaries that are not known at compile time, this makes `write_indices` kernels ~5x slower than it should be. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58468 Reviewed By: mruberry Differential Revision: D28511147 Pulled By: ngimel fbshipit-source-id: fe7303ec77da1abbe5e874093eca247b3919616f	2021-05-18 10:33:10 -07:00
Your Name	4ce8378ec5	[local lint] Remove success checks in tests (#58490 ) Summary: Testing for both that a lint job ran and that it was successful depends on having lint pass for the PR, which can create confusion if it doesn't (i.e. a flake8 failure also causes this job to fail, and it's not immediately clear why). With this PR we just check for the presence of job names to see that something ran. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58490 Reviewed By: samestep Differential Revision: D28511229 Pulled By: driazati fbshipit-source-id: 3036deff9f9d0ef2e78b44a9a43b342acdcfa296	2021-05-18 09:31:13 -07:00
zhouzhuojie	afe23b8f8b	Fix alpine image (#58462 ) Summary: Fixes dockerhub rate limiting issue, use the ECR image instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58462 Reviewed By: malfet Differential Revision: D28510603 Pulled By: zhouzhuojie fbshipit-source-id: 2cac59da1d1efdf31df71e9f76d802f8e9a0bfd5	2021-05-18 09:22:28 -07:00
Vasiliy Kuznetsov	821a97595b	fx quant: improve performance of all_node_args_have_no_tensors (#58461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58461 Improves the logic which calculates whether a node has any tensors in its arguments by terminating the recursion early when possible. In a future PR, we should probably ditch this entire approach and switch to using dtype propagation. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28499455 fbshipit-source-id: bedd844022b90e1fcb7d7a3cb4cc65440dc9cc59	2021-05-18 07:19:59 -07:00
Richard Zou	e059fd40a8	Remove master documentation from being indexable by search engines (#58056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58056 This PR addresses an action item in #3428: disabling search engine indexing of master documentation. This is desireable because we want to direct users to our stable documentation (instead of master documentation) because they are more likely to have a stable version of PyTorch installed. Test Plan: 1. run `make html`, check that the noindex tags are there 2. run `make html-stable, check that the noindex tags aren't there Reviewed By: bdhirsh Differential Revision: D28490504 Pulled By: zou3519 fbshipit-source-id: 695c944c4962b2bd484dd7a5e298914a37abe787	2021-05-18 06:20:09 -07:00
Alban Desmaison	52b45b7655	Revert D28494073: [Gradient Compression] Do not skip the comm hook tests for Gloo/MPI backends Test Plan: revert-hammer Differential Revision: D28494073 (`df44f015fe`) Original commit changeset: 6ba14082f986 fbshipit-source-id: 0e094f09b59c93f5ee13a667aacfb3ccf608547e	2021-05-18 05:39:09 -07:00
Raghavan Raman	34d6618386	[NNC] Fixing a bug in simplifier (#58291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58291 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28435393 Pulled By: navahgar fbshipit-source-id: 517e47385a93a43d2ddf054382adc81c18484066	2021-05-18 01:28:33 -07:00
Yi Wang	df44f015fe	[Gradient Compression] Do not skip the comm hook tests for Gloo/MPI backends (#58444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58444 DDP communication hooks are already supported on Gloo and MPI backends. No longer need to skip these tests on Gloo/MPI backends. TODO: `test_ddp_hook_parity_powerSGD` failes on Gloo backend. Filed a bug #58467. ghstack-source-id: 129209528 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_comm_hook_logging buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_allreduce buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_allreduce_process_group buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_ddp_hook_parity_powerSGD Reviewed By: rohan-varma Differential Revision: D28494073 fbshipit-source-id: 6ba14082f98696bc4bd8c02395cb58b9c1795015	2021-05-17 23:05:01 -07:00
Martin Yuan	c38616491f	Conservatively move all suitable prim ops from full-jit to mobile, and make them selective. (#58353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58353 There are long tail operators in register_prim_ops_fulljit.cpp, that may be used in mobile build. In this PR 1. All of the ops that are likely to be used in mobile are moved to register_prim_ops.cpp. 2. Note that this move is conservative. If an op is likely to have fulljit dependency, or cannot be selective, it will be kept. Later if there's need to be used in mobile (rare), it will be adapted and moved case by case. 3. All the moved ops are marked selective. The registration function is changed from `Operator()` to `OperatorGenerator()`. Size regression is not expected. Test Plan: * Internal size tests * CI Reviewed By: dhruvbird Differential Revision: D28463158 Pulled By: iseeyuan fbshipit-source-id: 34536b8a569f1274329ccf1dac809fe9b891b4ff	2021-05-17 23:01:22 -07:00
Chen Lai	b5a834a739	[Pytorch] Build lite interpreter as default for iOS Summary: Two changes: 1. Build lite interpreter as default for iOS 2. Switch the previous lite interpreter test to full jit build test Test Plan: Imported from OSS Differential Revision: D27698039 Reviewed By: xta0 Pulled By: cccclai fbshipit-source-id: 022b554f4997ae577681f2b79a9ebe9236ca4f7d	2021-05-17 22:36:05 -07:00
Will Constable	8a3fb2689f	Wrap torch::deploy API functions in safe rethrow macros (#58412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58412 Second try- avoid ctor/dtor handling this time as it is kind of pointless if the rethrow will still terminate(), and upsets -Werror=terminate Original commit changeset: 1775bed18269 Test Plan: existing unit tests and CI Reviewed By: suo Differential Revision: D28478588 fbshipit-source-id: 84191cecc3ef52e23f11bfea07bbb9773ebc5df4	2021-05-17 22:09:19 -07:00
James Reed	7b73fdf597	[FX] Fix retracing wrapped functions (#58061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58061 Test Plan: Imported from OSS Reviewed By: yuhc Differential Revision: D28358801 Pulled By: jamesr66a fbshipit-source-id: c7c9a8a80e5bfe1eb1f6d2cf858ac7e57153a860	2021-05-17 19:50:16 -07:00
Horace He	5fa4541c65	Make new_ones an operator (#58405 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58405 Reviewed By: HDCharles Differential Revision: D28480075 Pulled By: Chillee fbshipit-source-id: bd29399867e2a002a2f395554621761d3c701f68	2021-05-17 19:24:34 -07:00
Nikita Shulga	0547a3be63	Change link order for BUILD_SPLIT_CUDA option (#58437 ) Summary: torch_cuda_cu depends on torch_cuda_cpp, so it should be linked first Otherwise linker keeps lots of cudnn symbols for no good reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/58437 Reviewed By: janeyx99 Differential Revision: D28496472 Pulled By: malfet fbshipit-source-id: 338605ff755591476070c172a6ea0a0dcd0beb23	2021-05-17 18:38:04 -07:00
Joel Schlosser	af463d2235	Add shape documentation for CosineEmbeddingLoss (#58403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58403 Reviewed By: HDCharles Differential Revision: D28480076 Pulled By: jbschlosser fbshipit-source-id: c2c51e9da86e274e80126bbcabebb27270f2d2d0	2021-05-17 18:14:16 -07:00
Natalia Gimelshein	e24dee00d4	add kernel launch checks after each kernel launch to silence the check (#58432 ) Summary: T90898552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58432 Reviewed By: r-barnes Differential Revision: D28487446 Pulled By: ngimel fbshipit-source-id: 3a756ffa3cd68720e132af27cd5ae36f7fd4a2d8	2021-05-17 18:03:19 -07:00
Lillian Johnson	7dd08504f6	[package] fix persistent_load error (#58439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58439 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28494250 Pulled By: Lilyjjo fbshipit-source-id: c068760db9c25dcbf5a88ea9343eab11f0e7736a	2021-05-17 17:38:53 -07:00
Yi Wang	314a578154	Clang format distributed_c10d.py (#58435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58435 Prepare for #53962 ghstack-source-id: 129171617 Test Plan: N/A Reviewed By: zhaojuanmao Differential Revision: D28490326 fbshipit-source-id: 2ed3c5850788b9702a8020f6ee6d0b579625bf89	2021-05-17 16:47:35 -07:00
Hao Lu	b6d3929b51	[ATen] Use MaybeOwned<T> in at::argmin/argmax (#58338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58338 Test Plan: CI Reviewed By: swolchok Differential Revision: D28458968 fbshipit-source-id: 2c759bdb9fbdbef32d804f6d8efb09fb1d2bb30a	2021-05-17 16:42:52 -07:00
Taylor Robie	6989eb60e5	Remove timeouts for C2 tests Summary: When run on very heavily loaded machines, some of these tests are timing out. It's not an issue with the test, it's an issue with the environment. I've removed the timeout so we at least keep unit test coverage. Test Plan: N/A: Fix breakages Reviewed By: ngimel Differential Revision: D28492334 fbshipit-source-id: aed3ee371763161aab2d356f5623c7df053fda6f	2021-05-17 16:39:30 -07:00
Eli Uriegas	4310decfbf	.github: Add intial Windows CPU GHA workflow (#58199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58199 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28465272 Pulled By: seemethere fbshipit-source-id: d221ad71d160088883896e018c58800dae85ff2c	2021-05-17 15:04:16 -07:00
Vasiliy Kuznetsov	c156a4ffaa	fx quant: fix crash on output dicts and lists (#58416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58416 https://github.com/pytorch/pytorch/pull/57519 had a regression not caught by CI, it added an assertion which failed on various model output types. This PR removes the assertion and adds the logic to observe graph outputs in a way that supports arbitrary output formats. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_output_lists_and_dicts ``` Imported from OSS Reviewed By: z-a-f Differential Revision: D28479946 fbshipit-source-id: bcce301f98a057b134c0cd34ab0ca96ba457863f	2021-05-17 15:02:09 -07:00
Vasiliy Kuznetsov	a1cacf3b5d	fx quant: remove test debug logs (#58415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58415 Removes test debugging logs which were committed, probably someone forgot to remove before landing. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: z-a-f Differential Revision: D28479947 fbshipit-source-id: 3adba87c51652e3353f455b293abc90debe3dd7d	2021-05-17 15:01:03 -07:00
BowenBao	3d12ab452e	[ONNX] Fix split export in opset13 (#56277 ) (#57605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57605 Fix split export in opset13 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393522 Pulled By: SplitInfinity fbshipit-source-id: 4de83345ec7bc9bafe778fe534d9a8760ce16ab3 Co-authored-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com> Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-17 14:50:33 -07:00
Chen Lai	0c3db1cb33	[Pytorch] Build lite interpreter as default for Android Summary: Build lite interpreter as default for android, should wait until https://github.com/pytorch/pytorch/pull/56002 lands Mainly two changes: 1. Use lite interpreter as default for Android 2. Switch the lite interpreter build test to full jit build test Test Plan: Imported from OSS Differential Revision: D27695530 Reviewed By: IvanKobzarev Pulled By: cccclai fbshipit-source-id: e1b2c70fee6590accc22c7404b9dd52c7d7c36e2	2021-05-17 14:12:48 -07:00
Serhat Yilmaz	d645088f2f	[torch] Format repeat_interleave op files (#58313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58313 Same as title. I am planning to send a follow-up diff to this op, so sending formatting diff ahead to keep PR simple. Test Plan: Rely on existing signals since this is simple formatting diff. Reviewed By: ngimel Differential Revision: D28447685 fbshipit-source-id: c7cd473b61e40e6f50178aca88b9af197a759099	2021-05-17 13:51:53 -07:00
Jeffrey Wan	06c1094ea0	Merge CreationMeta MULTI_OUTPUT_SAFE with MULTI_OUTPUT_NODE (#58285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57679 ##### Release Notes This is part of the end of the deprecation of inplace/view: - `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead. - The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...". Pull Request resolved: https://github.com/pytorch/pytorch/pull/58285 Reviewed By: bdhirsh Differential Revision: D28441980 Pulled By: soulitzer fbshipit-source-id: e2301d7b8cbc3dcdd328c46f24bcb9eb7f3c0d87	2021-05-17 13:48:39 -07:00
Sam Estep	3507ca320b	Remove unused python2 shebang (#58409 ) Summary: This is the only line (not in `third_party`) matching the regex `^#!.*python2`, and [it is not the first line of its file](https://github.com/koalaman/shellcheck/wiki/SC1128), so it has no effect. As a followup to https://github.com/pytorch/pytorch/issues/58275, this PR removes that shebang to reduce confusion, so now all Python shebangs in this repo are `python3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58409 Reviewed By: walterddr Differential Revision: D28478469 Pulled By: samestep fbshipit-source-id: c17684c8651e45d3fc383cbbc04a31192d10f52f	2021-05-17 13:19:32 -07:00
Nikita Shulga	98cc0aa6b0	Use torch.allclose to check tensor equality (#58429 ) Summary: This fixes test_lkj_cholesky_log_prob if default codepath is used I.e. test is executed as follows: ``` ATEN_CPU_CAPABILITY=default python3 distributions/test_distributions.py -v -k test_lkj_cholesky_log_prob ``` Fixes https://github.com/pytorch/pytorch/issues/58381 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58429 Reviewed By: neerajprad Differential Revision: D28484340 Pulled By: malfet fbshipit-source-id: 32afcc75e5250f5a11d66b4fa194ea1c784454a6	2021-05-17 13:16:35 -07:00
David Reiss	50f9a1812e	Enable NNAPI in internal build (#58324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58324 Test Plan: Build Size Bot. Segmentation in Spark Player. Reviewed By: axitkhurana Differential Revision: D28435176 fbshipit-source-id: f2fb25e3cd331433e7a3156a528811abd3bcbf3a	2021-05-17 12:52:56 -07:00
David Reiss	532632ca26	Don't bind Android NNAPI on Apple platforms (#58323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58323 Currently there is no way to run NNAPI on Apple platforms. Disabling the binding with the preprocessor makes this easier to enable NNAPI in the internal build without affecting iOS size. This should be reverted soon and migrated to selective build. Test Plan: Build Size Bot on later diff. Reviewed By: axitkhurana Differential Revision: D28435179 fbshipit-source-id: 040eeb74532752630d329b15d5f95c538c2e3f9e	2021-05-17 12:51:46 -07:00
Jacob Szwejbka	1891e4bf1e	[Pytorch] Remove run_on_bundled_input (#58344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58344 remove a helper function thats more trouble then its worth. ghstack-source-id: 129131889 Test Plan: ci and {P414950111} Reviewed By: dhruvbird Differential Revision: D28460607 fbshipit-source-id: 31bd6c1cc169785bb360e3113d258b612cad47fc	2021-05-17 12:44:00 -07:00
Ansley Ussery	443ce1e8a1	Improve error message when Proxy object is iterated (#58302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58302 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D28444030 Pulled By: ansley fbshipit-source-id: ee29b0f7b2199f8590de4c5945b0d4ce59230ce2	2021-05-17 12:42:23 -07:00
Sam Estep	a4ce85ad68	Chown workspace in calculate-docker-image (#58398 ) Summary: Since https://github.com/pytorch/pytorch/issues/58299 changed the calculate-docker-image job from `ubuntu-18.04` to `linux.2xlarge`, it has been sometimes failing with this message: ``` Warning: Unable to clean or reset the repository. The repository will be recreated instead. Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' Error: Command failed: rm -rf "/home/ec2-user/actions-runner/_work/pytorch/pytorch/.azure_pipelines" ``` - https://github.com/pytorch/pytorch/runs/2587348894 - https://github.com/pytorch/pytorch/runs/2592943274 - https://github.com/pytorch/pytorch/runs/2600707737 This PR hopes to fix that issue by adding the "Chown workspace" step that we already use for the other jobs in the Linux CI workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58398 Reviewed By: seemethere Differential Revision: D28476902 Pulled By: samestep fbshipit-source-id: a7dbf0ad9c18ac44cc1a3cef7647f56489958fe6	2021-05-17 12:40:55 -07:00
Ansley Ussery	e8981e7c5d	Improve `CONTRIBUTING.md` (#58396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58396 Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28476510 Pulled By: ansley fbshipit-source-id: 3f45bee93dfeda06a44570305f9699bcafc45d2e	2021-05-17 12:36:38 -07:00
albanD	9afe9fba29	Reland OpInfo support for forward AD (#58304 ) Summary: Try 3 to land this. Trying ci-all label to ensure we test everything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58304 Reviewed By: heitorschueroff Differential Revision: D28474343 Pulled By: albanD fbshipit-source-id: 8230fa3c0a8d3633f09999e7c2f47dbdc5fe57e9	2021-05-17 12:33:27 -07:00
Brian Hirsh	1a9efbbc92	generate inplace/out kernels for xla (#57510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57510 This is a re-write of https://github.com/pytorch/pytorch/pull/56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876	2021-05-17 12:25:38 -07:00
Brian Hirsh	9354a68e7d	[codegen] split out backend-specific information from NativeFunction in the model (#57361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361 Data model change in the codegen, which splits backend-specific information out of `NativeFunction` ### Overview Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types. Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key. Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend. There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends. This has a few advantages: - We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index - The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend. - There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved. An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants. ### Data Model Changes Removed 3 classes, which are used by the external codegen: - ExternalBackendFunction - ExternalBackendFunctionsGroup - ExternalBackendMetadata And added two new ones: - BackendIndex - BackendMetadata `BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op). `BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured. ### Questions I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out: Dealing with `structured`. This PR separates out the notion of `structured` into two bits of information: - Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`. - Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper. I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not. I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything too controversial related to structured in the change, but I tried to call out any areas in the comments Localizing the fact that external backends follow Dispatcher convention. Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions. Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away. Thoughts on the `BackendIndex` / `BackendMetadata` breakdown. One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474362 Pulled By: bdhirsh fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661	2021-05-17 12:25:35 -07:00
Brian Hirsh	0db33eda2a	remove bridge API from codegen (#55796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55796 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474361 Pulled By: bdhirsh fbshipit-source-id: c7f5ce35097f8eaa514f3df8f8559548188b265b	2021-05-17 12:25:32 -07:00
Brian Hirsh	3d9f10f530	[external codegen] better yaml error messaging, added explicit error message tests (#56597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56597 3 small changes, all centered around error messaging. 1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml 2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them. 3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out: - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird. If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know! Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474363 Pulled By: bdhirsh fbshipit-source-id: 8b5ec804b388dbbc0350a20c053da657fad0474f	2021-05-17 12:25:29 -07:00
Brian Hirsh	4dc1b8e06b	add _to_cpu() operator (#55795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55795 description coming soon Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474365 Pulled By: bdhirsh fbshipit-source-id: 0704d7ce354308601a0af9ab48851459f34ce7a0	2021-05-17 12:23:35 -07:00

6168 changed files with 579678 additions and 227783 deletions

									
										2

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml
									
												View File
												
				@ -44,7 +44,7 @@ jobs:

				        is_official_build: ${{ parameters.is_official_build}}

				    # Sync and update PyTorch submodules

				    - bash: git submodule update --init --recursive

				    - bash: git submodule update --init --recursive --jobs 0

				      displayName: Update PyTorch submodules

				    # Build PyTorch and run unit tests - no packaging

									
										2

.azure_pipelines/job_templates/build-verify-publish-template-win.yml
									
												View File
												
				@ -47,7 +47,7 @@ jobs:

				        is_official_build: ${{ parameters.is_official_build}}

				    # Sync and update PyTorch submodules

				    - script: git submodule update --init --recursive

				    - script: git submodule update --init --recursive --jobs 0

				      displayName: Update PyTorch submodules

				    # Build PyTorch and run unit tests - no packaging

									
										26

.azure_pipelines/job_templates/notify-webapp-template.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,26 @@

				parameters:

				  name: ''

				  pool: ''

				  customMatrixes: ''

				jobs:

				- job: ${{parameters.name}}

				  timeoutInMinutes: 600

				  strategy:

				    matrix:

				      ${{ insert }}: ${{parameters.customMatrixes}}

				  pool:

				    name: ${{ parameters.pool}}

				  steps:

				  # Clone PyTorch Tests repository

				  - bash: |

				      B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)

				      git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)

				      cd pytorch_tests

				      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

				    env:

				      _ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

				    displayName: Clone PyTorch Tests repo

				  - bash: |

				      bash $(Build.SourcesDirectory)/pytorch_tests/webapp/notify_webapp.sh

				    displayName: Notify Webapp

									
										2

.azure_pipelines/job_templates/prepare-build-template.yml
									
												View File
												
				@ -46,7 +46,7 @@ steps:

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output .\tmp_bin\sccache.exe

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output .\tmp_bin\sccache-cl.exe

				      copy .\tmp_bin\sccache.exe .\tmp_bin\nvcc.exe

				      curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output .\tmp_bin\randomtemp.exe

				      curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.4/randomtemp.exe --output .\tmp_bin\randomtemp.exe

				    displayName: Install sccache and randomtemp

				    condition: not(eq(variables.CUDA_VERSION, ''))

									
										12

.azure_pipelines/job_templates/pytorch-template-unix.yml
									
												View File
												
				@ -33,7 +33,7 @@ jobs:

				  # Clone PyTorch Tests repository

				  - bash: |

				      B64_PAT=$(printf "%s"":$_ADOTOKEN" | base64)

				      B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)

				      git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)

				      cd pytorch_tests

				      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

				@ -48,4 +48,14 @@ jobs:

				      _TS_CLONE_P: $(TS_CLONE_PASSWORD)

				      _TS_P: $(TS_PAT)

				      _TS_SM_P: $(TS_SM_PAT)

				      _AZUREML_CLONE_PASSWORD: $(AZUREML_CLONE_PASSWORD)

				      _SPPASSWORD: $(SPPASSWORD)

				    displayName: Run PyTorch Unit Tests

				  # Tests results are available outside the docker container since

				  # the current directory is mounted as a volume of the container.

				  - task: PublishTestResults@2

				    condition: always()

				    inputs:

				      testResultsFiles: '**/test-*.xml'

				      testRunTitle: 'Publish test results for Python'

									
										8

.azure_pipelines/job_templates/pytorch-template-win.yml
									
												View File
												
				@ -47,3 +47,11 @@ jobs:

				      _TS_P: $(TS_PAT)

				      _TS_SM_P: $(TS_SM_PAT)

				    displayName: Run PyTorch Unit Tests

				  # Tests results are available outside the docker container since

				  # the current directory is mounted as a volume of the container.

				  - task: PublishTestResults@2

				    condition: always()

				    inputs:

				      testResultsFiles: '**\test-*.xml'

				      testRunTitle: 'Publish test results for Python'

									
										4

.azure_pipelines/job_templates/set-environment-variables.yml
									
												View File
												
				@ -120,9 +120,7 @@ steps:

				        Write-Host "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]$(Build.SourcesDirectory)\mkl\lib;$env:CMAKE_LIBRARY_PATH"

				        Write-Host "##vso[task.setvariable variable=ADDITIONAL_PATH;]$(Build.SourcesDirectory)\tmp_bin"

				        Write-Host "##vso[task.setvariable variable=SCCACHE_IDLE_TIMEOUT;]1500"

				        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\nvcc.exe"

				        Write-Host "##vso[task.setvariable variable=CUDA_NVCC_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\randomtemp.exe"

				        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_BASEDIR;]$(Build.SourcesDirectory)\tmp_bin"

				        Write-Host "##vso[task.setvariable variable=CMAKE_CUDA_COMPILER_LAUNCHER;]$(Build.SourcesDirectory)/tmp_bin/randomtemp.exe;$(Build.SourcesDirectory)/tmp_bin/sccache.exe"

				      displayName: Set MKL, sccache and randomtemp environment variables

				    # View current environment variables

									
										2

.azure_pipelines/job_templates/wheel-wait-job-template.yml
									
												View File
												
				@ -8,7 +8,7 @@ steps:

				    connectionType: 'connectedServiceName'

				    serviceConnection: circleciconn

				    method: 'POST'

				    headers: '{"Content-Type":"application/json", "BranchName":"$(TARGET_BRANCH_TO_CHECK_PR)", "JobName":"$(TARGET_CIRCLECI_PR)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'

				    headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_TARGET_PR_NUMBER)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'

				    body: ''

				    urlSuffix: 'api/JobStatus'

				    waitForCompletion: true

									
										55

.azure_pipelines/job_templates/wheel-wait-template.yml
									
												View File
												
				@ -1,6 +1,6 @@

				# Initiate 5 agentless-server waiting jobs to check on the

				# status of PR artifact builds, for a maximum wait time of

				# 5 * 60 min =300 minutes. These jobs will pass immediately

				# 11*60 min=660 mins. These jobs will pass immediately

				# once targeted CircleCI build is ready.

				jobs:

				@ -8,7 +8,6 @@ jobs:

				  pool: server

				  timeoutInMinutes: 60

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				@ -17,7 +16,6 @@ jobs:

				  timeoutInMinutes: 60

				  dependsOn: checkjob1

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				@ -26,7 +24,6 @@ jobs:

				  timeoutInMinutes: 60

				  dependsOn: checkjob2

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				@ -35,7 +32,6 @@ jobs:

				  timeoutInMinutes: 60

				  dependsOn: checkjob3

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				@ -44,6 +40,53 @@ jobs:

				  timeoutInMinutes: 60

				  dependsOn: checkjob4

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob6

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob5

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob7

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob6

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob8

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob7

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob9

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob8

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob10

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob9

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

				- job: checkjob11

				  pool: server

				  timeoutInMinutes: 60

				  dependsOn: checkjob10

				  continueOnError: true

				  steps:

				  - template: wheel-wait-job-template.yml

									
										10

.azure_pipelines/nightly-pytorch-tests-pipeline.yml
									
												View File
												
				@ -48,3 +48,13 @@ stages:

				          _PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)

				          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)

				          _RUN_TESTS: $(RUN_TESTS_WIN)

				- stage: 'NotifyWebapp'

				  displayName: 'Notify Webapp that pipeline is finished'

				  dependsOn: NightlyCustomTests

				  condition: succeededOrFailed()

				  jobs:

				  - template: job_templates/notify-webapp-template.yml

				    parameters:

				      name: ubuntu_1804_CPU

				      pool: $(BUILD_POOL_LIN_1)

									
										36

.azure_pipelines/pytorch-tests-pipeline.yml
									
												View File
												
				@ -7,14 +7,28 @@

				#   2) runs custom PyTorch unit-tests on PyTorch

				#      wheels generated during PR builds.

				resources:

				  webhooks:

				    - webhook: GitHubPyTorchPRTrigger

				      connection: GitHubPyTorchPRTriggerConnection

				      filters:

				        - path: repositoryName

				          value: pytorch_tests

				stages:

				- stage: 'EnsureArtifactsReady'

				  displayName: 'Ensure PyTorch PR Artifacts are ready'

				  jobs:

				  - template: job_templates/wheel-wait-template.yml

				  variables:

				    _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				    _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				    _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

				- stage: 'PRCustomTests'

				  displayName: 'Run custom unit tests on PyTorch wheels'

				  dependsOn: EnsureArtifactsReady

				  condition: succeeded()

				  jobs:

				  - template: job_templates/pytorch-template-unix.yml

				    parameters:

				@ -24,7 +38,25 @@ stages:

				        PR_Custom_Tests:

				          _PYTHON_VERSION: $(PYTHON_VERSION_PR)

				          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)

				          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_PR)

				          _TARGET_BRANCH_TO_CHECK: $(TARGET_BRANCH_TO_CHECK_PR)

				          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)

				          _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				          _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				          _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

				          _DOCKER_IMAGE: $(DOCKER_IMAGE_PR)

				          _RUN_TESTS: $(RUN_TESTS_PR)

				- stage: 'NotifyWebapp'

				  displayName: 'Notify Webapp that pipeline is finished'

				  dependsOn: PRCustomTests

				  condition: succeededOrFailed()

				  jobs:

				  - template: job_templates/notify-webapp-template.yml

				    parameters:

				      name: ubuntu_1804_CPU

				      pool: $(BUILD_POOL_LIN_1)

				      customMatrixes:

				        PR_Notify_WebApp:

				          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)

				          _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				          _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				          _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

16

.bazelrc

View File

 @ -1,3 +1,19 @@
 build --copt=--std=c++14
 build --copt=-I.
 build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
 build --experimental_ui_max_stdouterr_bytes=2048576
 # Configuration to disable tty features for environments like CI
 build:no-tty --curses no
 build:no-tty --progress_report_interval 10
 build:no-tty --show_progress_rate_limit 10
 # Configuration to build with GPU support
 build:gpu --define=cuda=true
 # define a separate build folder for faster switching between configs
 build:gpu --platform_suffix=-gpu
 # rules_cuda configuration
 build:gpu --@rules_cuda//cuda:enable_cuda
 build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
 build:gpu --@rules_cuda//cuda:compiler=nvcc
 build:gpu --repo_env=CUDA_PATH=/usr/local/cuda

2

.bazelversion

View File

 @ -1 +1 @@
 .1.0
 .2.1

									
										1

.circleci/README.md
									
												View File
												
				@ -343,7 +343,6 @@ All linux builds occur in docker images. The docker images are

				    * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds

				    * Also used for cpu builds

				* pytorch/manylinux-cuda90

				* pytorch/manylinux-cuda92

				* pytorch/manylinux-cuda100

				    * Also used for cpu builds

									
										18

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -30,21 +30,7 @@ def get_processor_arch_name(gpu_version):

				        "cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version

				    )

				LINUX_PACKAGE_VARIANTS = OrderedDict(

				    manywheel=[

				        "3.6m",

				        "3.7m",

				        "3.8m",

				        "3.9m"

				    ],

				    conda=dimensions.STANDARD_PYTHON_VERSIONS,

				    libtorch=[

				        "3.7m",

				    ],

				)

				CONFIG_TREE_DATA = OrderedDict(

				    linux=(dimensions.GPU_VERSIONS, LINUX_PACKAGE_VARIANTS),

				    macos=([None], OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				@ -63,7 +49,8 @@ CONFIG_TREE_DATA = OrderedDict(

				        ],

				    )),

				    windows=(

				        [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS],

				        # Stop building Win+CU102, see https://github.com/pytorch/pytorch/issues/65648

				        [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS and v != "cuda102"],

				        OrderedDict(

				            wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				            conda=dimensions.STANDARD_PYTHON_VERSIONS,

				@ -126,6 +113,7 @@ class PackageFormatConfigNode(ConfigNode):

				        self.props["python_versions"] = python_versions

				        self.props["package_format"] = package_format

				    def get_children(self):

				        if self.find_prop("os_name") == "linux":

				            return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]

									
										6

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -124,9 +124,9 @@ class Conf(object):

				        Output looks similar to:

				      - binary_upload:

				          name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload

				          name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload

				          context: org-member

				          requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test

				          requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test

				          filters:

				            branches:

				              only:

				@ -134,7 +134,7 @@ class Conf(object):

				            tags:

				              only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/

				          package_type: manywheel

				          upload_subfolder: cu92

				          upload_subfolder: cu113

				        """

				        return {

				            "binary_upload": OrderedDict({

									
										11

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -3,12 +3,13 @@ PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    "102",

				    "111",

				    "113",

				    "115",

				]

				ROCM_VERSIONS = [

				    "4.0.1",

				    "4.1",

				    "4.2",

				    "4.3.1",

				    "4.5.2",

				]

				ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

				@ -16,8 +17,8 @@ ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

				GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS

				STANDARD_PYTHON_VERSIONS = [

				    "3.6",

				    "3.7",

				    "3.8",

				    "3.9"

				    "3.9",

				    "3.10"

				]

									
										111

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -1,99 +1,7 @@

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				from cimodel.lib.conf_tree import ConfigNode

				CONFIG_TREE_DATA = [

				    ("xenial", [

				        ("gcc", [

				            ("5.4", [  # All this subtree rebases to master and then build

				                ("3.6", [

				                    ("important", [X(True)]),

				                    ("parallel_tbb", [X(True)]),

				                    ("parallel_native", [X(True)]),

				                    ("pure_torch", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				            ("7", [X("3.6")]),

				        ]),

				        ("clang", [

				            ("5", [

				                ("3.6", [

				                    ("asan", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("7", [

				                ("3.6", [

				                    ("onnx", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("10.2", [

				                ("3.6", [

				                    ("shard_test", [X(True)]),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [X(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("11.1", [

				                ("3.8", [

				                    ("shard_test", [XImportant(True)]),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [X(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				        ]),

				    ]),

				    ("bionic", [

				        ("clang", [

				            ("9", [

				                ("3.6", [

				                    ("noarch", [XImportant(True)]),

				                ]),

				            ]),

				            ("9", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                    ("vulkan", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("10.2", [

				                ("3.9", [

				                    ("shard_test", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("gcc", [

				            ("9", [

				                ("3.8", [

				                    ("coverage", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				        ]),

				        ("rocm", [

				            ("3.9", [

				                ("3.6", [

				                    ('build_only', [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				    ]),

				]

				@ -174,12 +82,19 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				            "build_only": BuildOnlyConfigNode,

				            "shard_test": ShardTestConfigNode,

				            "cuda_gcc_override": CudaGccOverrideConfigNode,

				            "coverage": CoverageConfigNode,

				            "pure_torch": PureTorchConfigNode,

				            "slow_gradcheck": SlowGradcheckConfigNode,

				        }

				        return next_nodes[experimental_feature]

				class SlowGradcheckConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_slow_gradcheck"] = True

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class PureTorchConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PURE_TORCH=" + str(label)

				@ -310,14 +225,6 @@ class ShardTestConfigNode(TreeConfigNode):

				        return ImportantConfigNode

				class CoverageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_coverage"] = node_name

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "IMPORTANT=" + str(label)

									
										85

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -31,6 +31,7 @@ class Conf:

				    is_libtorch: bool = False

				    is_important: bool = False

				    parallel_backend: Optional[str] = None

				    build_only: bool = False

				    @staticmethod

				    def is_test_phase(phase):

				@ -112,6 +113,8 @@ class Conf:

				            parameters["resource_class"] = "xlarge"

				        if hasattr(self, 'filters'):

				            parameters['filters'] = self.filters

				        if self.build_only:

				            parameters['build_only'] = miniutils.quote(str(int(True)))

				        return parameters

				    def gen_workflow_job(self, phase):

				@ -175,35 +178,6 @@ class DocPushConf(object):

				            }

				        }

				# TODO Convert these to graph nodes

				def gen_dependent_configs(xenial_parent_config):

				    extra_parms = [

				        (["multigpu"], "large"),

				        (["nogpu", "NO_AVX2"], None),

				        (["nogpu", "NO_AVX"], None),

				        (["slow"], "medium"),

				    ]

				    configs = []

				    for parms, gpu in extra_parms:

				        c = Conf(

				            xenial_parent_config.distro,

				            ["py3"] + parms,

				            pyver=xenial_parent_config.pyver,

				            cuda_version=xenial_parent_config.cuda_version,

				            restrict_phases=["test"],

				            gpu_resource=gpu,

				            parent_build=xenial_parent_config,

				            is_important=False,

				        )

				        configs.append(c)

				    return configs

				def gen_docs_configs(xenial_parent_config):

				    configs = []

				@ -211,7 +185,7 @@ def gen_docs_configs(xenial_parent_config):

				        HiddenConf(

				            "pytorch_python_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				            filters=gen_filter_dict(branches_list=["master", "nightly"],

				                                    tags_list=RC_PATTERN),

				        )

				    )

				@ -227,7 +201,7 @@ def gen_docs_configs(xenial_parent_config):

				        HiddenConf(

				            "pytorch_cpp_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				            filters=gen_filter_dict(branches_list=["master", "nightly"],

				                                    tags_list=RC_PATTERN),

				        )

				    )

				@ -238,13 +212,6 @@ def gen_docs_configs(xenial_parent_config):

				            branch="master",

				        )

				    )

				    configs.append(

				        HiddenConf(

				            "pytorch_doc_test",

				            parent_build=xenial_parent_config

				        )

				    )

				    return configs

				@ -258,7 +225,7 @@ def gen_tree():

				    return configs_list

				def instantiate_configs():

				def instantiate_configs(only_slow_gradcheck):

				    config_list = []

				@ -272,13 +239,16 @@ def instantiate_configs():

				        compiler_version = fc.find_prop("compiler_version")

				        is_xla = fc.find_prop("is_xla") or False

				        is_asan = fc.find_prop("is_asan") or False

				        is_coverage = fc.find_prop("is_coverage") or False

				        is_noarch = fc.find_prop("is_noarch") or False

				        is_onnx = fc.find_prop("is_onnx") or False

				        is_pure_torch = fc.find_prop("is_pure_torch") or False

				        is_vulkan = fc.find_prop("is_vulkan") or False

				        is_slow_gradcheck = fc.find_prop("is_slow_gradcheck") or False

				        parms_list_ignored_for_docker_image = []

				        if only_slow_gradcheck ^ is_slow_gradcheck:

				            continue

				        python_version = None

				        if compiler_name == "cuda" or compiler_name == "android":

				            python_version = fc.find_prop("pyver")

				@ -313,10 +283,6 @@ def instantiate_configs():

				            python_version = fc.find_prop("pyver")

				            parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if is_coverage:

				            parms_list_ignored_for_docker_image.append("coverage")

				            python_version = fc.find_prop("pyver")

				        if is_noarch:

				            parms_list_ignored_for_docker_image.append("noarch")

				@ -342,6 +308,10 @@ def instantiate_configs():

				        if build_only or is_pure_torch:

				            restrict_phases = ["build"]

				        if is_slow_gradcheck:

				            parms_list_ignored_for_docker_image.append("old")

				            parms_list_ignored_for_docker_image.append("gradcheck")

				        gpu_resource = None

				        if cuda_version and cuda_version != "10":

				            gpu_resource = "medium"

				@ -361,6 +331,7 @@ def instantiate_configs():

				            is_libtorch=is_libtorch,

				            is_important=is_important,

				            parallel_backend=parallel_backend,

				            build_only=build_only,

				        )

				        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds

				@ -381,36 +352,14 @@ def instantiate_configs():

				                                        tags_list=RC_PATTERN)

				            c.dependent_tests = gen_docs_configs(c)

				        if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (

				            compiler_name == "gcc"

				            and compiler_version == "5.4"

				            and not is_libtorch

				            and not is_vulkan

				            and not is_pure_torch

				            and parallel_backend is None

				        ):

				            bc_breaking_check = Conf(

				                "backward-compatibility-check",

				                [],

				                is_xla=False,

				                restrict_phases=["test"],

				                is_libtorch=False,

				                is_important=True,

				                parent_build=c,

				            )

				            c.dependent_tests.append(bc_breaking_check)

				        config_list.append(c)

				    return config_list

				def get_workflow_jobs():

				def get_workflow_jobs(only_slow_gradcheck=False):

				    config_list = instantiate_configs()

				    config_list = instantiate_configs(only_slow_gradcheck)

				    x = []

				    for conf_options in config_list:

									
										119

.circleci/cimodel/data/simple/android_definitions.py
									
												View File
											
				@ -1,119 +0,0 @@

				import cimodel.data.simple.util.branch_filters as branch_filters

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK

				)

				import cimodel.lib.miniutils as miniutils

				class AndroidJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 is_master_only=True):

				        self.variant = variant

				        self.template_name = template_name

				        self.is_master_only = is_master_only

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant + [

				            "build",

				        ]

				        full_job_name = "_".join(base_name_parts)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "build_environment": "\"{}\"".format(build_env_name),

				            "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),

				            "requires": [DOCKER_REQUIREMENT_NDK]

				        }

				        if self.is_master_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        return [{self.template_name: props_dict}]

				class AndroidGradleJob:

				    def __init__(self,

				                 job_name,

				                 template_name,

				                 dependencies,

				                 is_master_only=True,

				                 is_pr_only=False,

				                 extra_props=tuple()):

				        self.job_name = job_name

				        self.template_name = template_name

				        self.dependencies = dependencies

				        self.is_master_only = is_master_only

				        self.is_pr_only = is_pr_only

				        self.extra_props = dict(extra_props)

				    def gen_tree(self):

				        props_dict = {

				            "name": self.job_name,

				            "requires": self.dependencies,

				        }

				        if self.is_master_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        elif self.is_pr_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)

				        if self.extra_props:

				            props_dict.update(self.extra_props)

				        return [{self.template_name: props_dict}]

				WORKFLOW_DATA = [

				    AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),

				    AndroidJob(["x86_64"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",

				        "pytorch_android_gradle_build-x86_32",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				        "pytorch_android_gradle_custom_build_single",

				        [DOCKER_REQUIREMENT_NDK],

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				        "pytorch_android_gradle_custom_build_single",

				        [DOCKER_REQUIREMENT_NDK],

				        is_master_only=False,

				        is_pr_only=True,

				        extra_props=tuple({

				            "lite_interpreter": miniutils.quote(str(int(False)))

				        }.items())),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

				        "pytorch_android_gradle_build",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										10

.circleci/cimodel/data/simple/binary_smoketest.py
									
												View File
												
				@ -120,9 +120,9 @@ WORKFLOW_DATA = [

				    ),

				    SmoketestJob(

				        "binary_windows_build",

				        ["wheel", "3.7", "cu102"],

				        ["wheel", "3.7", "cu113"],

				        None,

				        "binary_windows_wheel_3_7_cu102_build",

				        "binary_windows_wheel_3_7_cu113_build",

				        is_master_only=True,

				    ),

				@ -144,11 +144,11 @@ WORKFLOW_DATA = [

				    ),

				    SmoketestJob(

				        "binary_windows_test",

				        ["wheel", "3.7", "cu102"],

				        ["wheel", "3.7", "cu113"],

				        None,

				        "binary_windows_wheel_3_7_cu102_test",

				        "binary_windows_wheel_3_7_cu113_test",

				        is_master_only=True,

				        requires=["binary_windows_wheel_3_7_cu102_build"],

				        requires=["binary_windows_wheel_3_7_cu113_build"],

				        extra_props={

				            "executor": "windows-with-nvidia-gpu",

				        },

									
										37

.circleci/cimodel/data/simple/docker_definitions.py
									
												View File
												
				@ -4,37 +4,24 @@ from cimodel.lib.miniutils import quote

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

				# TODO: make this generated from a matrix rather than just a static list

				# NOTE: All hardcoded docker image builds have been migrated to GHA

				IMAGE_NAMES = [

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				    "pytorch-linux-bionic-py3.6-clang9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

				    "pytorch-linux-bionic-py3.8-gcc9",

				    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    "pytorch-linux-xenial-py3-clang5-asan",

				    "pytorch-linux-xenial-py3-clang7-onnx",

				    "pytorch-linux-xenial-py3.8",

				    "pytorch-linux-xenial-py3.6-clang7",

				    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds

				    "pytorch-linux-xenial-py3.6-gcc7.2",

				    "pytorch-linux-xenial-py3.6-gcc7",

				    "pytorch-linux-bionic-rocm3.9-py3.6",

				    "pytorch-linux-bionic-rocm4.0.1-py3.6",

				    "pytorch-linux-bionic-rocm4.1-py3.6",

				    "pytorch-linux-bionic-rocm4.2-py3.6",

				]

				# This entry should be an element from the list above

				# This should contain the image matching the "slow_gradcheck" entry in

				# pytorch_build_data.py

				SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				def get_workflow_jobs():

				def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):

				    """Generates a list of docker image build definitions"""

				    ret = []

				    for image_name in IMAGE_NAMES:

				    for image_name in images:

				        if image_name.startswith('docker-'):

				            image_name = image_name.lstrip('docker-')

				        if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:

				            continue

				        parameters = OrderedDict({

				            "name": quote(f"docker-{image_name}"),

				            "image_name": quote(image_name),

									
										78

.circleci/cimodel/data/simple/ge_config_tests.py
									
												View File
											
				@ -1,78 +0,0 @@

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2

				class GeConfigTestJob:

				    def __init__(self,

				                 py_version,

				                 gcc_version,

				                 cuda_version,

				                 variant_parts,

				                 extra_requires,

				                 use_cuda_docker=False,

				                 build_env_override=None):

				        self.py_version = py_version

				        self.gcc_version = gcc_version

				        self.cuda_version = cuda_version

				        self.variant_parts = variant_parts

				        self.extra_requires = extra_requires

				        self.use_cuda_docker = use_cuda_docker

				        self.build_env_override = build_env_override

				    def get_all_parts(self, with_dots):

				        maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []

				        maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []

				        maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []

				        common_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				        ] + maybe_cuda_version + maybe_py_version + maybe_gcc_version

				        return common_parts + self.variant_parts

				    def gen_tree(self):

				        resource_class = "gpu.medium" if self.use_cuda_docker else "large"

				        docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC

				        full_name = "_".join(self.get_all_parts(False))

				        build_env = self.build_env_override or "-".join(self.get_all_parts(True))

				        props_dict = {

				            "name": full_name,

				            "build_environment": build_env,

				            "requires": self.extra_requires,

				            "resource_class": resource_class,

				            "docker_image": docker_image,

				        }

				        if self.use_cuda_docker:

				            props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))

				        return [{"pytorch_linux_test": props_dict}]

				WORKFLOW_DATA = [

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["jit_legacy", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "jit_legacy", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				    ),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										8

.circleci/cimodel/data/simple/ios_definitions.py
									
												View File
												
				@ -1,7 +1,7 @@

				from cimodel.data.simple.util.versions import MultiPartVersion

				import cimodel.lib.miniutils as miniutils

				XCODE_VERSION = MultiPartVersion([12, 0, 0])

				XCODE_VERSION = MultiPartVersion([12, 5, 1])

				class ArchVariant:

				@ -75,6 +75,12 @@ WORKFLOW_DATA = [

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={

				        "op_list": "mobilenetv2.yaml",

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				    IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={

				        "use_coreml": miniutils.quote(str(int(True))),

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={

				        "use_coreml": miniutils.quote(str(int(True))),

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				]

									
										33

.circleci/cimodel/data/simple/mobile_definitions.py
									
												View File
												
				@ -4,12 +4,6 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

				import cimodel.lib.miniutils as miniutils

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_ASAN,

				    DOCKER_REQUIREMENT_ASAN,

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class MobileJob:

				@ -52,33 +46,6 @@ class MobileJob:

				WORKFLOW_DATA = [

				    MobileJob(

				        DOCKER_IMAGE_ASAN,

				        [DOCKER_REQUIREMENT_ASAN],

				        ["build"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["custom", "build", "dynamic"]

				    ),

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["custom", "build", "static"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["code", "analysis"],

				        True

				    ),

				]

									
										77

.circleci/cimodel/data/simple/nightly_android.py
									
												View File
											
				@ -1,77 +0,0 @@

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class AndroidNightlyJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 extra_props=None,

				                 with_docker=True,

				                 requires=None,

				                 no_build_suffix=False):

				        self.variant = variant

				        self.template_name = template_name

				        self.extra_props = extra_props or {}

				        self.with_docker = with_docker

				        self.requires = requires

				        self.no_build_suffix = no_build_suffix

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant

				        build_suffix = [] if self.no_build_suffix else ["build"]

				        full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "requires": self.requires,

				            "filters": {"branches": {"only": "nightly"}},

				        }

				        props_dict.update(self.extra_props)

				        if self.with_docker:

				            props_dict["docker_image"] = DOCKER_IMAGE_NDK

				            props_dict["build_environment"] = build_env_name

				        return [{self.template_name: props_dict}]

				BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK]

				WORKFLOW_DATA = [

				    AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",

				                      with_docker=False,

				                      requires=[

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				    AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",

				                      extra_props={"context": "org-member"},

				                      with_docker=False,

				                      requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],

				                      no_build_suffix=True),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										23

.circleci/cimodel/data/simple/nightly_ios.py
									
												View File
												
				@ -1,12 +1,15 @@

				import cimodel.data.simple.ios_definitions as ios_definitions

				import cimodel.lib.miniutils as miniutils

				class IOSNightlyJob:

				    def __init__(self,

				                 variant,

				                 is_full_jit=False,

				                 is_upload=False):

				        self.variant = variant

				        self.is_full_jit = is_full_jit

				        self.is_upload = is_upload

				    def get_phase_name(self):

				@ -16,8 +19,11 @@ class IOSNightlyJob:

				        extra_name_suffix = [self.get_phase_name()] if self.is_upload else []

				        extra_name = ["full_jit"] if self.is_full_jit else []

				        common_name_pieces = [

				            "ios",

				        ] + extra_name + [

				        ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [

				            "nightly",

				            self.variant,

				@ -30,7 +36,8 @@ class IOSNightlyJob:

				        return "_".join(["pytorch"] + self.get_common_name_pieces(False))

				    def gen_tree(self):

				        extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []

				        build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS

				        extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []

				        props_dict = {

				            "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),

				@ -43,6 +50,11 @@ class IOSNightlyJob:

				            props_dict["ios_arch"] = self.variant

				            props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)

				            props_dict["name"] = self.gen_job_name()

				            props_dict["use_metal"] = miniutils.quote(str(int(True)))

				            props_dict["use_coreml"] = miniutils.quote(str(int(True)))

				        if self.is_full_jit:

				            props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))

				        template_name = "_".join([

				            "binary",

				@ -58,9 +70,14 @@ BUILD_CONFIGS = [

				    IOSNightlyJob("arm64"),

				]

				BUILD_CONFIGS_FULL_JIT = [

				    IOSNightlyJob("x86_64", is_full_jit=True),

				    IOSNightlyJob("arm64", is_full_jit=True),

				]

				WORKFLOW_DATA = BUILD_CONFIGS + [

				    IOSNightlyJob("binary", is_upload=True),

				WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [

				    IOSNightlyJob("binary", is_full_jit=False, is_upload=True),

				    IOSNightlyJob("binary", is_full_jit=True, is_upload=True),

				]

									
										4

.circleci/cimodel/data/simple/util/docker_constants.py
									
												View File
												
				@ -11,7 +11,7 @@ def gen_docker_image_requires(image_name):

				DOCKER_IMAGE_BASIC, DOCKER_REQUIREMENT_BASE = gen_docker_image(

				    "pytorch-linux-xenial-py3.6-gcc5.4"

				    "pytorch-linux-xenial-py3.7-gcc5.4"

				)

				DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(

				@ -19,7 +19,7 @@ DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(

				)

				DOCKER_IMAGE_GCC7, DOCKER_REQUIREMENT_GCC7 = gen_docker_image(

				    "pytorch-linux-xenial-py3.6-gcc7"

				    "pytorch-linux-xenial-py3.7-gcc7"

				)

									
										164

.circleci/cimodel/data/windows_build_definitions.py
									
												View File
											
				@ -1,164 +0,0 @@

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN, NON_PR_BRANCH_LIST

				from cimodel.data.simple.util.versions import CudaVersion

				class WindowsJob:

				    def __init__(

				        self,

				        test_index,

				        vscode_spec,

				        cuda_version,

				        force_on_cpu=False,

				        multi_gpu=False,

				        master_only=False,

				        nightly_only=False,

				        master_and_nightly=False

				    ):

				        self.test_index = test_index

				        self.vscode_spec = vscode_spec

				        self.cuda_version = cuda_version

				        self.force_on_cpu = force_on_cpu

				        self.multi_gpu = multi_gpu

				        self.master_only = master_only

				        self.nightly_only = nightly_only

				        self.master_and_nightly = master_and_nightly

				    def gen_tree(self):

				        base_phase = "build" if self.test_index is None else "test"

				        numbered_phase = (

				            base_phase if self.test_index is None else base_phase + str(self.test_index)

				        )

				        key_parts = ["pytorch", "windows", base_phase]

				        if self.multi_gpu:

				            key_parts.append('multigpu')

				        key_name = "_".join(key_parts)

				        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []

				        target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"

				        base_name_parts = [

				            "pytorch",

				            "windows",

				            self.vscode_spec.render(),

				            "py36",

				            target_arch,

				        ]

				        prerequisite_jobs = []

				        if base_phase == "test":

				            prerequisite_jobs.append("_".join(base_name_parts + ["build"]))

				        if self.cuda_version:

				            self.cudnn_version = 8 if self.cuda_version.major == 11 else 7

				        arch_env_elements = (

				            ["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]

				            if self.cuda_version

				            else ["cpu"]

				        )

				        build_environment_string = "-".join(

				            ["pytorch", "win"]

				            + self.vscode_spec.get_elements()

				            + arch_env_elements

				            + ["py3"]

				        )

				        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu

				        if self.multi_gpu:

				            props_dict = {"requires": prerequisite_jobs}

				        else:

				            props_dict = {

				                "build_environment": build_environment_string,

				                "python_version": miniutils.quote("3.6"),

				                "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

				                "vc_year": miniutils.quote(str(self.vscode_spec.year)),

				                "vc_product": self.vscode_spec.get_product(),

				                "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),

				                "requires": prerequisite_jobs,

				            }

				        if self.master_only:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict()

				        elif self.nightly_only:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict(branches_list=["nightly"], tags_list=RC_PATTERN)

				        elif self.master_and_nightly:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict(branches_list=NON_PR_BRANCH_LIST + ["nightly"], tags_list=RC_PATTERN)

				        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]

				        if not self.multi_gpu:

				            if base_phase == "test":

				                test_name = "-".join(["pytorch", "windows", numbered_phase])

				                props_dict["test_name"] = test_name

				                if is_running_on_cuda:

				                    props_dict["executor"] = "windows-with-nvidia-gpu"

				            props_dict["cuda_version"] = (

				                miniutils.quote(str(self.cuda_version))

				                if self.cuda_version

				                else "cpu"

				            )

				        props_dict["name"] = "_".join(name_parts)

				        return [{key_name: props_dict}]

				class VcSpec:

				    def __init__(self, year, version_elements=None, hide_version=False):

				        self.year = year

				        self.version_elements = version_elements or []

				        self.hide_version = hide_version

				    def get_elements(self):

				        if self.hide_version:

				            return [self.prefixed_year()]

				        return [self.prefixed_year()] + self.version_elements

				    def get_product(self):

				        return "BuildTools"

				    def dotted_version(self):

				        return ".".join(self.version_elements)

				    def prefixed_year(self):

				        return "vs" + str(self.year)

				    def render(self):

				        return "_".join(self.get_elements())

				_VC2019 = VcSpec(2019)

				WORKFLOW_DATA = [

				    # VS2019 CUDA-10.1

				    WindowsJob(None, _VC2019, CudaVersion(10, 1), master_only=True),

				    WindowsJob(1, _VC2019, CudaVersion(10, 1), master_only=True),

				    WindowsJob(2, _VC2019, CudaVersion(10, 1), master_only=True),

				    # VS2019 CUDA-11.1

				    WindowsJob(None, _VC2019, CudaVersion(11, 1)),

				    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only=True),

				    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only=True),

				    WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, nightly_only=True),

				    # VS2019 CPU-only

				    WindowsJob(None, _VC2019, None),

				    WindowsJob(1, _VC2019, None),

				    WindowsJob(2, _VC2019, None),

				    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only=True),

				]

				def get_windows_workflows():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

6735

.circleci/config.yml generated

View File

File diff suppressed because it is too large Load Diff

									
										2

.circleci/docker/README.md
									
												View File
												
				@ -27,5 +27,5 @@ Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definit

				./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				# Set flags (see build.sh) and build image

				sudo bash -c 'BREAKPAD=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				```

									
										4

.circleci/docker/android/build.gradle
									
												View File
												
				@ -51,9 +51,9 @@ android {

				dependencies {

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'androidx.appcompat:appcompat:1.0.0'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'

				    implementation 'com.google.code.findbugs:jsr305:3.0.1'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    implementation 'com.facebook.soloader:nativeloader:0.10.1'

				    implementation 'junit:junit:' + rootProject.junitVersion

				    implementation 'androidx.test:core:' + rootProject.coreVersion

									
										162

.circleci/docker/build.sh
									
												View File
												
				@ -78,127 +78,127 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u

				case "$image" in

				  pytorch-linux-xenial-py3.8)

				    ANACONDA_PYTHON_VERSION=3.8

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc7)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.1

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    TENSORRT_VERSION=8.0.1.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.5.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.13.5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-asan)

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-onnx)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.13.5

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				    GRADLE_VERSION=6.8.3

				    CMAKE_VERSION=3.7.0

				    NINJA_VERSION=1.9.0

				    ;;

				  pytorch-linux-xenial-py3.6-clang7)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-clang7)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    CLANG_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-py3.6-clang9)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-py3.7-clang9)

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    VULKAN_SDK_VERSION=1.2.162.1

				    SWIFTSHADER=yes

				    ;;

				@ -208,28 +208,15 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)

				    CUDA_VERSION=10.2

				@ -239,53 +226,42 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.7-gcc9)

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=3.9

				    ;;

				  pytorch-linux-bionic-rocm4.0.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-rocm4.3.1-py3.7)

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.0.1

				    ROCM_VERSION=4.3.1

				    ;;

				  pytorch-linux-bionic-rocm4.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-rocm4.5-py3.7)

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.1

				    ;;

				  pytorch-linux-bionic-rocm4.2-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.2

				    ROCM_VERSION=4.5.2

				    ;;

				  *)

				    # Catch-all for builds that are not hardcoded.

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    echo "image '$image' did not match an existing build configuration"

				    if [[ "$image" == *xenial* ]]; then

				      CMAKE_VERSION=3.10.3

				    fi

				    if [[ "$image" == *py* ]]; then

				      extract_version_from_image_name py ANACONDA_PYTHON_VERSION

				    fi

				@ -320,7 +296,17 @@ if [ -n "${JENKINS:-}" ]; then

				  JENKINS_GID=$(id -g jenkins)

				fi

				tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | head -c 32)"

				tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')

				# If we are trying to use nvidia cuda image make sure it exists, otherwise use IMAGE from ghcr.io

				# this logic currently only exists for ubuntu

				if [[ "$image" == *cuda*  && ${OS} == "ubuntu" ]]; then

				  IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"

				  if ! DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect "${IMAGE_NAME}" >/dev/null 2>/dev/null; then

				    IMAGE_NAME="ghcr.io/pytorch/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"

				    INSTALL_CUDNN="True"

				  fi

				fi

				# Build image

				# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

				@ -348,7 +334,7 @@ docker build \

				       --build-arg "GCC_VERSION=${GCC_VERSION}" \

				       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

				       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

				       --build-arg "BREAKPAD=${BREAKPAD}" \

				       --build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \

				       --build-arg "ANDROID=${ANDROID}" \

				       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

				       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

				@ -358,6 +344,9 @@ docker build \

				       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \

				       --build-arg "KATEX=${KATEX:-}" \

				       --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \

				       --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \

				       --build-arg "IMAGE_NAME=${IMAGE_NAME}" \

				       --build-arg "INSTALL_CUDNN=${INSTALL_CUDNN}" \

				       -f $(dirname ${DOCKERFILE})/Dockerfile \

				       -t "$tmp_tag" \

				       "$@" \

				@ -376,6 +365,7 @@ function drun() {

				}

				if [[ "$OS" == "ubuntu" ]]; then

				  if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then

				    echo "OS=ubuntu, but:"

				    drun lsb_release -a

									
										15

.circleci/docker/build_docker.sh
									
												View File
												
				@ -26,11 +26,14 @@ login() {

				    docker login -u AWS --password-stdin "$1"

				}

				# Retry on timeouts (can happen on job stampede).

				retry login "${registry}"

				# Logout on exit

				trap "docker logout ${registry}" EXIT

				# Only run these steps if not on github actions

				if [[ -z "${GITHUB_ACTIONS}" ]]; then

				  # Retry on timeouts (can happen on job stampede).

				  retry login "${registry}"

				  # Logout on exit

				  trap "docker logout ${registry}" EXIT

				fi

				# export EC2=1

				# export JENKINS=1

				@ -45,8 +48,8 @@ trap "docker logout ${registry}" EXIT

				docker push "${image}:${tag}"

				docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then

				  trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT

				  docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				  aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

				fi

									
										12

.circleci/docker/centos-rocm/Dockerfile
									
												View File
												
				@ -4,6 +4,10 @@ FROM centos:${CENTOS_VERSION}

				ARG CENTOS_VERSION

				# Set AMD gpu targets to build for

				ARG PYTORCH_ROCM_ARCH

				ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}

				# Install required packages to build Caffe2

				# Install common dependencies (so that this step can be cached separately)

				@ -11,6 +15,12 @@ ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Update CentOS git version

				RUN yum -y remove git

				RUN yum -y remove git-*

				RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm

				RUN yum install -y git

				# Install devtoolset

				ARG DEVTOOLSET_VERSION

				ADD ./common/install_devtoolset.sh install_devtoolset.sh

				@ -27,7 +37,7 @@ RUN rm install_glibc.sh

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

									
										18

.circleci/docker/common/install_base.sh
									
												View File
												
				@ -11,8 +11,13 @@ install_ubuntu() {

				  #   "$UBUNTU_VERSION" == "18.04"

				  if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				    cmake3="cmake=3.10*"

				    maybe_libiomp_dev="libiomp-dev"

				  elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then

				    cmake3="cmake=3.16*"

				    maybe_libiomp_dev=""

				  else

				    cmake3="cmake=3.5*"

				    maybe_libiomp_dev="libiomp-dev"

				  fi

				  # Install common dependencies

				@ -33,7 +38,7 @@ install_ubuntu() {

				    git \

				    libatlas-base-dev \

				    libc6-dbg \

				    libiomp-dev \

				    ${maybe_libiomp_dev} \

				    libyaml-dev \

				    libz-dev \

				    libjpeg-dev \

				@ -44,6 +49,10 @@ install_ubuntu() {

				    wget \

				    vim

				  # Should resolve issues related to various apt package repository cert issues

				  # see: https://github.com/pytorch/pytorch/issues/65931

				  apt-get install -y libgnutls30

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				@ -109,14 +118,11 @@ esac

				# Install Valgrind separately since the apt-get version is too old.

				mkdir valgrind_build && cd valgrind_build

				VALGRIND_VERSION=3.16.1

				if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2

				then

				  wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2

				fi

				wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2

				tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2

				cd valgrind-${VALGRIND_VERSION}

				./configure --prefix=/usr/local

				make -j 4

				make -j6

				sudo make install

				cd ../../

				rm -rf valgrind_build

									
										19

.circleci/docker/common/install_breakpad.sh
									
												View File
											
				@ -1,19 +0,0 @@

				#!/bin/bash

				set -ex

				git clone https://github.com/malfet/breakpad.git -b pytorch/release-1.9

				pushd breakpad

				git clone https://chromium.googlesource.com/linux-syscall-support src/third_party/lss

				pushd src/third_party/lss

				# same as with breakpad, there are no real releases for this repo so use a

				# commit as the pin

				git checkout e1e7b0ad8ee99a875b272c8e33e308472e897660

				popd

				./configure

				make

				make install

				popd

				rm -rf breakpad

									
										3

.circleci/docker/common/install_cmake.sh
									
												View File
												
				@ -4,6 +4,9 @@ set -ex

				[ -n "$CMAKE_VERSION" ]

				# Remove system cmake install so it won't get used instead

				apt-get remove cmake -y

				# Turn 3.6.3 into v3.6

				path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')

				file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

									
										36

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -13,7 +13,12 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				      CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"

				    ;;

				    3)

				      CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"

				      if [ "$ANACONDA_PYTHON_VERSION" = "3.6" ]; then

				        # Latest release of Conda that still supports python-3.6

				        CONDA_FILE="Miniconda3-py37_4.10.3-Linux-x86_64.sh"

				      else

				        CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"

				      fi

				    ;;

				    *)

				      echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"

				@ -56,7 +61,9 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  pushd /opt/conda

				  # Track latest conda update

				  as_jenkins conda update -y -n base conda

				  if [ "$ANACONDA_PYTHON_VERSION" != "3.6" ]; then

				    as_jenkins conda update -y -n base conda

				  fi

				  # Install correct Python version

				  as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"

				@ -69,8 +76,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  }

				  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  # DO NOT install cmake here as it would install a version newer than 3.10, but

				  # we want to pin to version 3.10.

				  SCIPY_VERSION=1.1.0

				  if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then

				    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				@ -86,18 +93,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

				  fi

				  if [[ "$CUDA_VERSION" == 10.0* ]]; then

				    conda_install magma-cuda100 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

				    conda_install magma-cuda101 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.2* ]]; then

				    conda_install magma-cuda102 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.0* ]]; then

				    conda_install magma-cuda110 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.1* ]]; then

				    conda_install magma-cuda111 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.3* ]]; then

				    conda_install magma-cuda113 -c pytorch

				  # Magma package names are concatenation of CUDA major and minor ignoring revision

				  # I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89

				  if [ -n "$CUDA_VERSION" ]; then

				    conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch

				  fi

				  # TODO: This isn't working atm

				@ -107,22 +106,21 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # TODO: Why is scipy pinned

				  # Pin MyPy version because new errors are likely to appear with each release

				  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

				  # Pin coverage so we can use COVERAGE_RCFILE

				  as_jenkins pip install --progress-bar off pytest \

				    scipy==$SCIPY_VERSION \

				    scikit-image \

				    psutil \

				    unittest-xml-reporting \

				    boto3==1.16.34 \

				    coverage==5.5 \

				    hypothesis==4.53.2 \

				    expecttest==0.1.3 \

				    mypy==0.812 \

				    tb-nightly

				  # Install numba only on python-3.8 or below

				  # For numba issue see https://github.com/pytorch/pytorch/issues/51511

				  if [[ $(python -c "import sys; print(int(sys.version_info < (3, 9)))") == "1" ]]; then

				    as_jenkins pip install --progress-bar off numba librosa>=0.6.2

				    as_jenkins pip install --progress-bar off numba==0.54.1 librosa>=0.6.2

				  else

				    as_jenkins pip install --progress-bar off numba==0.49.0 librosa>=0.6.2

				  fi

									
										10

.circleci/docker/common/install_cudnn8.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,10 @@

				#!/bin/bash

				sudo apt-get update

				# also install ssh to avoid error of:

				# --------------------------------------------------------------------------

				# The value of the MCA parameter "plm_rsh_agent" was set to a path

				# that could not be found:

				#   plm_rsh_agent: ssh : rsh

				sudo apt-get install -y ssh

				sudo apt-get update && apt-get install -y --no-install-recommends libcudnn8=8.2.0.53-1+cuda11.3 libcudnn8-dev=8.2.0.53-1+cuda11.3 && apt-mark hold libcudnn8

									
										17

.circleci/docker/common/install_db.sh
									
												View File
												
				@ -2,23 +2,6 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

									
										11

.circleci/docker/common/install_gcc.sh
									
												View File
												
				@ -7,15 +7,18 @@ if [ -n "$GCC_VERSION" ]; then

				  # Need the official toolchain repo to get alternate packages

				  add-apt-repository ppa:ubuntu-toolchain-r/test

				  apt-get update

				  if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then

				  if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then

				    apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12

				    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50

				    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50

				    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50

				  else

				    apt-get install -y g++-$GCC_VERSION

				    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

				  fi

				  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

									
										4

.circleci/docker/common/install_nccl.sh
									
												View File
											
				@ -1,4 +0,0 @@

				#!/bin/bash

				sudo apt-get -qq update

				sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

									
										6

.circleci/docker/common/install_openmpi.sh
									
												View File
												
				@ -1,4 +1,10 @@

				#!/bin/bash

				sudo apt-get update

				# also install ssh to avoid error of:

				# --------------------------------------------------------------------------

				# The value of the MCA parameter "plm_rsh_agent" was set to a path

				# that could not be found:

				#   plm_rsh_agent: ssh : rsh

				sudo apt-get install -y ssh

				sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

									
										6

.circleci/docker/common/install_openssl.sh
									
												View File
												
				@ -4,11 +4,11 @@ set -ex

				OPENSSL=openssl-1.1.1k

				wget -q -O "${OPENSSL}.tar.gz" "https://www.openssl.org/source/${OPENSSL}.tar.gz"

				wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"

				tar xf "${OPENSSL}.tar.gz"

				cd "${OPENSSL}"

				./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'

				# NOTE: opensl errors out when built with the -j option

				make install_sw

				# NOTE: openssl install errors out when built with the -j option

				make -j6; make install_sw

				cd ..

				rm -rf "${OPENSSL}"

									
										27

.circleci/docker/common/install_protobuf.sh
									
												View File
												
				@ -2,8 +2,8 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				# This function installs protobuf 3.17

				install_protobuf_317() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				@ -12,37 +12,32 @@ install_protobuf_26() {

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz

				  # -j6 to balance memory usage and speed.

				  # naked `-j` seems to use too much memory.

				  pushd "$pb_dir" && ./configure && make -j6 && make -j6 check && sudo make -j6 install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  # Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we install that here if on 14.04

				  # Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will

				  # Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will

				  # install cmake3 here and use cmake3.

				  apt-get update

				  if [[ "$UBUNTU_VERSION" == 14.04 ]]; then

				    apt-get install -y --no-install-recommends cmake3

				    install_protobuf_26

				  else

				    apt-get install -y --no-install-recommends \

				            libprotobuf-dev \

				            protobuf-compiler

				  fi

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				  install_protobuf_317

				}

				install_centos() {

				  # Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we always install install that here

				  install_protobuf_26

				  install_protobuf_317

				}

				# Install base packages depending on the base OS

									
										49

.circleci/docker/common/install_rocm.sh
									
												View File
												
				@ -6,14 +6,23 @@ install_magma() {

				    # "install" hipMAGMA into /opt/rocm/magma by copying after build

				    git clone https://bitbucket.org/icl/magma.git

				    pushd magma

				    git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f

				    # fix for magma_queue memory leak issue

				    git checkout c62d700d880c7283b33fb1d615d62fc9c7f7ca21

				    cp make.inc-examples/make.inc.hip-gcc-mkl make.inc

				    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

				    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

				    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --gpu-max-threads-per-block=256' >> make.inc

				    echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then

				      amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`

				    else

				      amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`

				    fi

				    for arch in $amdgpu_targets; do

				      echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc

				    done

				    # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition

				    sed -i 's/^FOPENMP/#FOPENMP/g' make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    make -f make.gen.hipMAGMA -j $(nproc)

				    LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

				@ -25,12 +34,19 @@ ver() {

				    printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');

				}

				# Map ROCm version to AMDGPU version

				declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" )

				install_ubuntu() {

				    apt-get update

				    if [[ $UBUNTU_VERSION == 18.04 ]]; then

				      # gpg-agent is not available by default on 18.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    if [[ $UBUNTU_VERSION == 20.04 ]]; then

				      # gpg-agent is not available by default on 20.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    apt-get install -y kmod

				    apt-get install -y wget

				@ -38,6 +54,13 @@ install_ubuntu() {

				    apt-get install -y libc++1

				    apt-get install -y libc++abi1

				    if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then

				        # Add amdgpu repository

				        UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`

				        local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"

				        echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list

				    fi

				    ROCM_REPO="ubuntu"

				    if [[ $(ver $ROCM_VERSION) -lt $(ver 4.2) ]]; then

				        ROCM_REPO="xenial"

				@ -45,7 +68,8 @@ install_ubuntu() {

				    # Add rocm repository

				    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

				    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list

				    local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"

				    echo "deb [arch=amd64] ${rocm_baseurl} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list

				    apt-get update --allow-insecure-repositories

				    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \

				@ -82,11 +106,24 @@ install_centos() {

				  yum install -y epel-release

				  yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`

				  if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then

				      # Add amdgpu repository

				      local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"

				      echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo

				      echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo

				      echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo

				      echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo

				      echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo

				      echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo

				  fi

				  local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}"

				  echo "[ROCm]" > /etc/yum.repos.d/rocm.repo

				  echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo

				  echo "baseurl=http://repo.radeon.com/rocm/yum/${ROCM_VERSION}" >> /etc/yum.repos.d/rocm.repo

				  echo "baseurl=${rocm_baseurl}" >> /etc/yum.repos.d/rocm.repo

				  echo "enabled=1" >> /etc/yum.repos.d/rocm.repo

				  echo "gpgcheck=0" >> /etc/yum.repos.d/rocm.repo

				  echo "gpgcheck=1" >> /etc/yum.repos.d/rocm.repo

				  echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/rocm.repo

				  yum update -y

									
										7

.circleci/docker/common/install_tensorrt.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,7 @@

				#!/bin/bash

				if [ -n "$TENSORRT_VERSION" ]; then

				    python3 -m pip install --upgrade setuptools pip

				    python3 -m pip install nvidia-pyindex

				    python3 -m pip install nvidia-tensorrt==${TENSORRT_VERSION} --extra-index-url https://pypi.ngc.nvidia.com

				fi

									
										17

.circleci/docker/common/install_vision.sh
									
												View File
												
				@ -2,23 +2,6 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

									
										33

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -1,13 +1,15 @@

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				ARG IMAGE_NAME

				FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}

				FROM ${IMAGE_NAME}

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				@ -24,7 +26,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -65,22 +67,29 @@ ADD ./common/install_openssl.sh install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				RUN bash ./install_openssl.sh

				# (optional) Install TensorRT

				ARG TENSORRT_VERSION

				ADD ./common/install_tensorrt.sh install_tensorrt.sh

				RUN if [ -n "${TENSORRT_VERSION}" ]; then bash ./install_tensorrt.sh; fi

				RUN rm install_tensorrt.sh

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc

				ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache

				# Add jni.h for java host build

				ADD ./common/install_jni.sh install_jni.sh

				ADD ./java/jni.h jni.h

				RUN bash ./install_jni.sh && rm install_jni.sh

				# Install NCCL for when CUDA is version 10.1

				ADD ./common/install_nccl.sh install_nccl.sh

				RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi

				RUN rm install_nccl.sh

				# Install Open MPI for CUDA

				ADD ./common/install_openmpi.sh install_openmpi.sh

				RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi

				@ -93,9 +102,17 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# AWS specific CUDA build guidance

				ENV TORCH_CUDA_ARCH_LIST Maxwell

				ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				ENV CUDA_PATH /usr/local/cuda

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				# Hack for CUDA 11.5.0 image to install cudnn8 since cudnn8 is not included with CUDA 11.5 image

				# Also note cudnn 8.2.0.53 is labeled for cuda 11.3

				ARG INSTALL_CUDNN

				ADD ./common/install_cudnn8.sh install_cudnn8.sh

				RUN if [ -n "${INSTALL_CUDNN}" ]; then bash install_cudnn8.sh; fi

				RUN rm install_cudnn8.sh

				USER jenkins

				CMD ["bash"]

									
										6

.circleci/docker/ubuntu-rocm/Dockerfile
									
												View File
												
				@ -6,6 +6,10 @@ ARG UBUNTU_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Set AMD gpu targets to build for

				ARG PYTORCH_ROCM_ARCH

				ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				@ -21,7 +25,7 @@ RUN bash ./install_clang.sh && rm install_clang.sh

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

									
										9

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -33,7 +33,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -82,13 +82,6 @@ RUN rm AndroidManifest.xml

				RUN rm build.gradle

				ENV INSTALLED_ANDROID ${ANDROID}

				# (optional) Install breakpad

				ARG BREAKPAD

				ADD ./common/install_breakpad.sh install_breakpad.sh

				RUN if [ -n "${BREAKPAD}" ]; then bash ./install_breakpad.sh; fi

				RUN rm install_breakpad.sh

				ENV INSTALLED_BREAKPAD ${BREAKPAD}

				# (optional) Install Vulkan SDK

				ARG VULKAN_SDK_VERSION

				ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

									
										13

.circleci/ecr_gc_docker/Dockerfile
									
												View File
											
				@ -1,13 +0,0 @@

				FROM ubuntu:18.04

				RUN apt-get update && apt-get install -y python3-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

				ADD requirements.txt /requirements.txt

				RUN pip3 install -r /requirements.txt

				ADD gc.py /usr/bin/gc.py

				ADD docker_hub.py /usr/bin/docker_hub.py

				ENTRYPOINT ["/usr/bin/gc.py"]

									
										125

.circleci/ecr_gc_docker/docker_hub.py
									
												View File
											
				@ -1,125 +0,0 @@

				#!/usr/bin/env python3

				from collections import namedtuple

				import boto3

				import requests

				import os

				IMAGE_INFO = namedtuple(

				    "IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")

				)

				def build_access_token(username, passwordtr):

				    r = requests.post(

				        "https://hub.docker.com/v2/users/login/",

				        data={"username": username, "password": password},

				    )

				    r.raise_for_status()

				    token = r.json().get("token")

				    return {"Authorization": "JWT " + token}

				def list_repos(user, token):

				    r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)

				    r.raise_for_status()

				    ret = sorted(

				        repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])

				    )

				    if ret:

				        print("repos found:")

				        print("".join("\n\t" + r for r in ret))

				    return ret

				def list_tags(repo, token):

				    r = requests.get(

				        "https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token

				    )

				    r.raise_for_status()

				    return [

				        IMAGE_INFO(

				            repo=repo,

				            tag=t["name"],

				            size=t["full_size"],

				            last_updated_at=t["last_updated"],

				            last_updated_by=t["last_updater_username"],

				        )

				        for t in r.json().get("results", [])

				    ]

				def save_to_s3(tags):

				    table_content = ""

				    client = boto3.client("s3")

				    for t in tags:

				        table_content += (

				            "<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"

				            "<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"

				        ).format(

				            repo=t.repo,

				            tag=t.tag,

				            size=t.size,

				            last_updated_at=t.last_updated_at,

				            last_updated_by=t.last_updated_by,

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css"

				                href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">

				            </script>

				            <script type="text/javascript" charset="utf8"

				                src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title> docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <caption>Docker images on docker hub</caption>

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">size</th>

				                <th scope="col">last_updated_at</th>

				                <th scope="col">last_updated_by</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );py

				        </script>

				    </html>

				    """.format(

				        table_content=table_content

				    )

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="docker_hub.html",

				        Body=html_body,

				        ContentType="text/html",

				    )

				if __name__ == "__main__":

				    username = os.environ.get("DOCKER_HUB_USERNAME")

				    password = os.environ.get("DOCKER_HUB_PASSWORD")

				    token = build_access_token(username, password)

				    tags = []

				    for repo in list_repos("pytorch", token):

				        tags.extend(list_tags(repo, token))

				    save_to_s3(tags)

									
										218

.circleci/ecr_gc_docker/gc.py
									
												View File
											
				@ -1,218 +0,0 @@

				#!/usr/bin/env python3

				import argparse

				import boto3

				import datetime

				import pytz

				import re

				import sys

				def save_to_s3(project, data):

				    table_content = ""

				    client = boto3.client("s3")

				    for repo, tag, window, age, pushed in data:

				        table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(

				            repo=repo, tag=tag, window=window, age=age, pushed=pushed

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>

				            <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title>{project} nightly and permanent docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">keep window</th>

				                <th scope="col">age</th>

				                <th scope="col">pushed at</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );

				        </script>

				    </html>

				    """.format(

				        project=project, table_content=table_content

				    )

				    # for pytorch, file can be found at

				    # http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

				    # and later one we can config docker.pytorch.org to point to the location

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="{project}.html".format(project=project),

				        Body=html_body,

				        ContentType="text/html",

				    )

				def repos(client):

				    paginator = client.get_paginator("describe_repositories")

				    pages = paginator.paginate(registryId="308535385114")

				    for page in pages:

				        for repo in page["repositories"]:

				            yield repo

				def images(client, repository):

				    paginator = client.get_paginator("describe_images")

				    pages = paginator.paginate(

				        registryId="308535385114", repositoryName=repository["repositoryName"]

				    )

				    for page in pages:

				        for image in page["imageDetails"]:

				            yield image

				parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")

				parser.add_argument(

				    "--dry-run", action="store_true", help="Dry run; print tags that would be deleted"

				)

				parser.add_argument(

				    "--debug", action="store_true", help="Debug, print ignored / saved tags"

				)

				parser.add_argument(

				    "--keep-stable-days",

				    type=int,

				    default=14,

				    help="Days of stable Docker tags to keep (non per-build images)",

				)

				parser.add_argument(

				    "--keep-unstable-days",

				    type=int,

				    default=1,

				    help="Days of unstable Docker tags to keep (per-build images)",

				)

				parser.add_argument(

				    "--filter-prefix",

				    type=str,

				    default="",

				    help="Only run cleanup for repositories with this prefix",

				)

				parser.add_argument(

				    "--ignore-tags",

				    type=str,

				    default="",

				    help="Never cleanup these tags (comma separated)",

				)

				args = parser.parse_args()

				if not args.ignore_tags or not args.filter_prefix:

				    print(

				        """

				Missing required arguments --ignore-tags and --filter-prefix

				You must specify --ignore-tags and --filter-prefix to avoid accidentally

				pruning a stable Docker tag which is being actively used.  This will

				make you VERY SAD.  So pay attention.

				First, which filter-prefix do you want?  The list of valid prefixes

				is in jobs/private.groovy under the 'docker-registry-cleanup' job.

				You probably want either pytorch or caffe2.

				Second, which ignore-tags do you want?  It should be whatever the most

				up-to-date DockerVersion for the repository in question is.  Follow

				the imports of jobs/pytorch.groovy to find them.

				"""

				    )

				    sys.exit(1)

				client = boto3.client("ecr", region_name="us-east-1")

				stable_window = datetime.timedelta(days=args.keep_stable_days)

				unstable_window = datetime.timedelta(days=args.keep_unstable_days)

				now = datetime.datetime.now(pytz.UTC)

				ignore_tags = args.ignore_tags.split(",")

				def chunks(chunkable, n):

				    """ Yield successive n-sized chunks from l.

				    """

				    for i in range(0, len(chunkable), n):

				        yield chunkable[i: i + n]

				SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')

				def looks_like_git_sha(tag):

				    """Returns a boolean to check if a tag looks like a git sha

				    For reference a sha1 is 40 characters with only 0-9a-f and contains no

				    "-" characters

				    """

				    return re.match(SHA_PATTERN, tag) is not None

				stable_window_tags = []

				for repo in repos(client):

				    repositoryName = repo["repositoryName"]

				    if not repositoryName.startswith(args.filter_prefix):

				        continue

				    # Keep list of image digests to delete for this repository

				    digest_to_delete = []

				    for image in images(client, repo):

				        tags = image.get("imageTags")

				        if not isinstance(tags, (list,)) or len(tags) == 0:

				            continue

				        created = image["imagePushedAt"]

				        age = now - created

				        for tag in tags:

				            if any([

				                    looks_like_git_sha(tag),

				                    tag.isdigit(),

				                    tag.count("-") == 4,  # TODO: Remove, this no longer applies as tags are now built using a SHA1

				                    tag in ignore_tags]):

				                window = stable_window

				                if tag in ignore_tags:

				                    stable_window_tags.append((repositoryName, tag, "", age, created))

				                elif age < window:

				                    stable_window_tags.append((repositoryName, tag, window, age, created))

				            else:

				                window = unstable_window

				            if tag in ignore_tags or age < window:

				                if args.debug:

				                    print("Ignoring {}:{} (age: {})".format(repositoryName, tag, age))

				                break

				        else:

				            for tag in tags:

				                print("{}Deleting {}:{} (age: {})".format("(dry run) " if args.dry_run else "", repositoryName, tag, age))

				            digest_to_delete.append(image["imageDigest"])

				    if args.dry_run:

				        if args.debug:

				            print("Skipping actual deletion, moving on...")

				    else:

				        # Issue batch delete for all images to delete for this repository

				        # Note that as of 2018-07-25, the maximum number of images you can

				        # delete in a single batch is 100, so chunk our list into batches of

				        # 100

				        for c in chunks(digest_to_delete, 100):

				            client.batch_delete_image(

				                registryId="308535385114",

				                repositoryName=repositoryName,

				                imageIds=[{"imageDigest": digest} for digest in c],

				            )

				        save_to_s3(args.filter_prefix, stable_window_tags)

3

.circleci/ecr_gc_docker/requirements.txt

View File

 @ -1,3 +0,0 @@
 boto3
 pytz
 requests

									
										61

.circleci/generate_config_yml.py
									
												View File
												
				@ -11,19 +11,11 @@ import sys

				from collections import namedtuple

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.simple.android_definitions

				import cimodel.data.simple.bazel_definitions

				import cimodel.data.simple.binary_smoketest

				import cimodel.data.simple.docker_definitions

				import cimodel.data.simple.ge_config_tests

				import cimodel.data.simple.ios_definitions

				import cimodel.data.simple.macos_definitions

				import cimodel.data.simple.mobile_definitions

				import cimodel.data.simple.nightly_android

				import cimodel.data.simple.nightly_ios

				import cimodel.data.simple.anaconda_prune_defintions

				import cimodel.data.windows_build_definitions as windows_build_definitions

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.miniyaml as miniyaml

				@ -80,15 +72,15 @@ class Header(object):

				        for line in filter(None, lines):

				            output_filehandle.write(line + "\n")

				def filter_master_only_jobs(items):

				    def _for_all_items(items, functor) -> None:

				        if isinstance(items, list):

				            for item in items:

				                _for_all_items(item, functor)

				        if isinstance(items, dict) and len(items) == 1:

				            item_type, item = next(iter(items.items()))

				            functor(item_type, item)

				def _for_all_items(items, functor) -> None:

				    if isinstance(items, list):

				        for item in items:

				            _for_all_items(item, functor)

				    if isinstance(items, dict) and len(items) == 1:

				        item_type, item = next(iter(items.items()))

				        functor(item_type, item)

				def filter_master_only_jobs(items):

				    def _is_master_item(item):

				        filters = item.get('filters', None)

				        branches = filters.get('branches', None) if filters is not None else None

				@ -126,33 +118,45 @@ def filter_master_only_jobs(items):

				    _for_all_items(items, _save_requires_if_master)

				    return _do_filtering(items)

				def generate_required_docker_images(items):

				    required_docker_images = set()

				    def _requires_docker_image(item_type, item):

				        requires = item.get('requires', None)

				        if not isinstance(requires, list):

				            return

				        for requirement in requires:

				            requirement = requirement.replace('"', '')

				            if requirement.startswith('docker-'):

				                required_docker_images.add(requirement)

				    _for_all_items(items, _requires_docker_image)

				    return required_docker_images

				def gen_build_workflows_tree():

				    build_workflows_functions = [

				        cimodel.data.simple.docker_definitions.get_workflow_jobs,

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.macos_definitions.get_workflow_jobs,

				        cimodel.data.simple.android_definitions.get_workflow_jobs,

				        cimodel.data.simple.ios_definitions.get_workflow_jobs,

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.ge_config_tests.get_workflow_jobs,

				        cimodel.data.simple.bazel_definitions.get_workflow_jobs,

				        cimodel.data.simple.binary_smoketest.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.nightly_android.get_workflow_jobs,

				        cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,

				        windows_build_definitions.get_windows_workflows,

				        binary_build_definitions.get_post_upload_jobs,

				        binary_build_definitions.get_binary_smoke_test_jobs,

				    ]

				    build_jobs = [f() for f in build_workflows_functions]

				    build_jobs.extend(

				        cimodel.data.simple.docker_definitions.get_workflow_jobs(

				            # sort for consistency

				            sorted(generate_required_docker_images(build_jobs))

				        )

				    )

				    master_build_jobs = filter_master_only_jobs(build_jobs)

				    binary_build_functions = [

				        binary_build_definitions.get_binary_build_jobs,

				        binary_build_definitions.get_nightly_tests,

				        binary_build_definitions.get_nightly_uploads,

				    ]

				    build_jobs = [f() for f in build_workflows_functions]

				    master_build_jobs = filter_master_only_jobs(build_jobs)

				    return {

				        "workflows": {

				            "binary_builds": {

				@ -181,7 +185,6 @@ YAML_SOURCES = [

				    File("build-parameters/binary-build-params.yml"),

				    File("build-parameters/promote-build-params.yml"),

				    Header("Job specs"),

				    File("job-specs/pytorch-job-specs.yml"),

				    File("job-specs/binary-job-specs.yml"),

				    File("job-specs/job-specs-custom.yml"),

				    File("job-specs/job-specs-promote.yml"),

				@ -190,8 +193,6 @@ YAML_SOURCES = [

				    File("job-specs/docker_jobs.yml"),

				    Header("Workflows"),

				    Treegen(gen_build_workflows_tree, 0),

				    File("workflows/workflows-scheduled-ci.yml"),

				    File("workflows/workflows-ecr-gc.yml"),

				    File("workflows/workflows-promote.yml"),

				]

									
										3

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -55,7 +55,7 @@ else

				  echo "Can't tell what to checkout"

				  exit 1

				fi

				retry git submodule update --init --recursive

				retry git submodule update --init --recursive --jobs 0

				echo "Using Pytorch from "

				git --no-pager log --max-count 1

				popd

				@ -63,7 +63,6 @@ popd

				# Clone the Builder master repo

				retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				git checkout release/1.9

				echo "Using builder from "

				git --no-pager log --max-count 1

				popd

									
										6

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -22,7 +22,7 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

				cd ${PROJ_ROOT}

				git submodule sync

				git submodule update --init --recursive

				git submodule update --init --recursive --jobs 0

				# run build script

				chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				@ -31,8 +31,12 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				echo "IOS_ARCH: ${IOS_ARCH}"

				echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"

				echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"

				export IOS_ARCH=${IOS_ARCH}

				export IOS_PLATFORM=${IOS_PLATFORM}

				export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}

				export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}

				unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				#store the binary

									
										11

.circleci/scripts/binary_ios_test.sh
									
												View File
												
				@ -8,16 +8,17 @@ cd ${PROJ_ROOT}/ios/TestApp

				# install fastlane

				sudo gem install bundler && bundle install

				# install certificates

				echo "${IOS_CERT_KEY}" >> cert.txt

				echo "${IOS_CERT_KEY_2022}" >> cert.txt

				base64 --decode cert.txt -o Certificates.p12

				rm cert.txt

				bundle exec fastlane install_cert

				bundle exec fastlane install_root_cert

				bundle exec fastlane install_dev_cert

				# install the provisioning profile

				PROFILE=PyTorch_CI_2021.mobileprovision

				PROFILE=PyTorch_CI_2022.mobileprovision

				PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				mkdir -pv "${PROVISIONING_PROFILES}"

				cd "${PROVISIONING_PROFILES}"

				echo "${IOS_SIGN_KEY}" >> cert.txt

				echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				base64 --decode cert.txt -o ${PROFILE}

				rm cert.txt

				# run the ruby build script

				@ -25,5 +26,5 @@ if ! [ -x "$(command -v xcodebuild)" ]; then

				    echo 'Error: xcodebuild is not installed.'

				    exit 1

				fi

				PROFILE=PyTorch_CI_2021

				PROFILE=PyTorch_CI_2022

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

									
										31

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -23,15 +23,27 @@ do

				    fi

				done

				lipo -i ${ZIP_DIR}/install/lib/*.a

				echo "BUILD_LITE_INTERPRETER: ${BUILD_LITE_INTERPRETER}"

				# copy the umbrella header and license

				cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/

				else

				    cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/

				fi

				cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/

				# zip the library

				ZIPFILE=libtorch_ios_nightly_build.zip

				export DATE="$(date -u +%Y%m%d)"

				export IOS_NIGHTLY_BUILD_VERSION="1.11.0.${DATE}"

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    # libtorch_lite_ios_nightly_1.11.0.20210810.zip

				    ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

				else

				    ZIPFILE="libtorch_ios_nightly_build.zip"

				fi

				cd ${ZIP_DIR}

				#for testing

				touch version.txt

				echo $(date +%s) > version.txt

				echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt

				zip -r ${ZIPFILE} install src version.txt LICENSE

				# upload to aws

				# Install conda then 'conda install' awscli

				@ -48,3 +60,16 @@ set +x

				# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"

				# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"

				aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    # create a new LibTorch-Lite-Nightly.podspec from the template

				    echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"

				    cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    # update pod version

				    sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    # push the new LibTorch-Lite-Nightly.podspec to CocoaPods

				    pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				fi

									
										14

.circleci/scripts/binary_linux_build.sh
									
												View File
												
				@ -4,10 +4,14 @@ echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"

				set -eux -o pipefail

				source /env

				# Defaults here so they can be changed in one place

				export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

				# Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization

				MEMORY_LIMIT_MAX_JOBS=18

				NUM_CPUS=$(( $(nproc) - 2 ))

				if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

				# Defaults here for **binary** linux builds so they can be changed in one place

				export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}

				if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				fi

				@ -22,5 +26,9 @@ else

				  build_script='manywheel/build.sh'

				fi

				if [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then

				  export BUILD_DEBUG_INFO=1

				fi

				# Build the package

				SKIP_ALL_TESTS=1 "/builder/$build_script"

									
										35

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -1,10 +1,24 @@

				#!/bin/bash

				source /home/circleci/project/env

				cat >/home/circleci/project/ci_test_script.sh <<EOL

				OUTPUT_SCRIPT=${OUTPUT_SCRIPT:-/home/circleci/project/ci_test_script.sh}

				# only source if file exists

				if [[ -f /home/circleci/project/env ]]; then

				  source /home/circleci/project/env

				fi

				cat >"${OUTPUT_SCRIPT}" <<EOL

				# =================== The following code will be executed inside Docker container ===================

				set -eux -o pipefail

				retry () {

				    "\$@"  || (sleep 1 && "\$@") || (sleep 2 && "\$@")

				}

				# Source binary env file here if exists

				if [[ -e "${BINARY_ENV_FILE:-/nofile}" ]]; then

				  source "${BINARY_ENV_FILE:-/nofile}"

				fi

				python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				# Set up Python

				@ -23,14 +37,23 @@ fi

				EXTRA_CONDA_FLAGS=""

				NUMPY_PIN=""

				if [[ "\$python_nodot" = *39* ]]; then

				PROTOBUF_PACKAGE="defaults::protobuf"

				if [[ "\$python_nodot" = *310* ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20

				  # we set a lower boundary here just to be safe

				  NUMPY_PIN=">=1.21.2"

				  PROTOBUF_PACKAGE="protobuf>=3.19.0"

				fi

				if [[ "\$python_nodot" = *39*  ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20

				  # we set a lower boundary here just to be safe

				  NUMPY_PIN=">=1.20"

				fi

				if [[ "$DESIRED_CUDA" == "cu112" ]]; then

				if [[ "$DESIRED_CUDA" == "cu112" || "$DESIRED_CUDA" == "cu115" ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				fi

				@ -59,7 +82,7 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				      ninja \

				      dataclasses \

				      typing-extensions \

				      defaults::protobuf \

				      ${PROTOBUF_PACKAGE} \

				      six

				    if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				      retry conda install -c pytorch -y cpuonly

				@ -92,4 +115,4 @@ EOL

				echo

				echo

				echo "The script that will run in the next step is:"

				cat /home/circleci/project/ci_test_script.sh

				cat "${OUTPUT_SCRIPT}"

									
										4

.circleci/scripts/binary_macos_build.sh
									
												View File
												
				@ -14,6 +14,10 @@ chmod +x "$build_script"

				# Build

				cat >"$build_script" <<EOL

				export PATH="$workdir/miniconda/bin:$PATH"

				if [[ "$CIRCLE_BRANCH" == "nightly" ]]; then

				  export USE_PYTORCH_METAL_EXPORT=1

				  export USE_COREML_DELEGATE=1

				fi

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  "$workdir/builder/conda/build_pytorch.sh"

				else

									
										132

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -19,39 +19,47 @@ tagged_version() {

				  fi

				}

				# We need to write an envfile to persist these variables to following

				# steps, but the location of the envfile depends on the circleci executor

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				  workdir="/Users/distiller/project"

				elif [[ "$OSTYPE" == "msys" ]]; then

				  # windows executor (builds and tests)

				  workdir="/c/w"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  workdir="/home/circleci/project"

				else

				  # docker executor (binary builds)

				  workdir="/"

				fi

				envfile="$workdir/env"

				touch "$envfile"

				chmod +x "$envfile"

				# These are only relevant for CircleCI

				# TODO: Remove these later once migrated fully to GHA

				if [[ -z ${IS_GHA:-} ]]; then

				  # We need to write an envfile to persist these variables to following

				  # steps, but the location of the envfile depends on the circleci executor

				  if [[ "$(uname)" == Darwin ]]; then

				    # macos executor (builds and tests)

				    workdir="/Users/distiller/project"

				  elif [[ "$OSTYPE" == "msys" ]]; then

				    # windows executor (builds and tests)

				    workdir="/c/w"

				  elif [[ -d "/home/circleci/project" ]]; then

				    # machine executor (binary tests)

				    workdir="/home/circleci/project"

				  else

				    # docker executor (binary builds)

				    workdir="/"

				  fi

				  envfile="$workdir/env"

				  touch "$envfile"

				  chmod +x "$envfile"

				# Parse the BUILD_ENVIRONMENT to package type, python, and cuda

				configs=($BUILD_ENVIRONMENT)

				export PACKAGE_TYPE="${configs[0]}"

				export DESIRED_PYTHON="${configs[1]}"

				export DESIRED_CUDA="${configs[2]}"

				if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				  export DESIRED_DEVTOOLSET=""

				  export LIBTORCH_CONFIG="${configs[3]:-}"

				  if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then

				    export DEBUG=1

				  # Parse the BUILD_ENVIRONMENT to package type, python, and cuda

				  configs=($BUILD_ENVIRONMENT)

				  export PACKAGE_TYPE="${configs[0]}"

				  export DESIRED_PYTHON="${configs[1]}"

				  export DESIRED_CUDA="${configs[2]}"

				  if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				    export DESIRED_DEVTOOLSET=""

				    export LIBTORCH_CONFIG="${configs[3]:-}"

				    if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then

				      export DEBUG=1

				    fi

				  else

				    export DESIRED_DEVTOOLSET="${configs[3]:-}"

				  fi

				else

				  export DESIRED_DEVTOOLSET="${configs[3]:-}"

				  envfile=${BINARY_ENV_FILE:-/tmp/env}

				  workdir="/pytorch"

				fi

				if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then

				  export BUILD_PYTHONLESS=1

				fi

				@ -62,7 +70,7 @@ if [[ -z "$DOCKER_IMAGE" ]]; then

				  if [[ "$PACKAGE_TYPE" == conda ]]; then

				    export DOCKER_IMAGE="pytorch/conda-cuda"

				  elif [[ "$DESIRED_CUDA" == cpu ]]; then

				    export DOCKER_IMAGE="pytorch/manylinux-cuda100"

				    export DOCKER_IMAGE="pytorch/manylinux-cpu"

				  else

				    export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"

				  fi

				@ -85,7 +93,7 @@ PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="1.9.0.dev$DATE"

				BASE_BUILD_VERSION="1.11.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				# Use 'git -C' to make doubly sure we're in the correct directory for checking

				# the git tag

				@ -131,24 +139,24 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  fi

				fi

				cat >>"$envfile" <<EOL

				cat >"$envfile" <<EOL

				# =================== The following code will be executed inside Docker container ===================

				export TZ=UTC

				echo "Running on $(uname -a) at $(date)"

				export PACKAGE_TYPE="$PACKAGE_TYPE"

				export DESIRED_PYTHON="$DESIRED_PYTHON"

				export DESIRED_PYTHON="${DESIRED_PYTHON:-}"

				export DESIRED_CUDA="$DESIRED_CUDA"

				export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"

				export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				export DESIRED_DEVTOOLSET="${DESIRED_DEVTOOLSET:-}"

				if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				  export LIBTORCH_CONFIG="${LIBTORCH_CONFIG:-}"

				  export DEBUG="${DEBUG:-}"

				fi

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.9.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.11.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				@ -156,6 +164,7 @@ export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				# TODO: We don't need this anymore IIUC

				export TORCH_PACKAGE_NAME='torch'

				export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'

				export ANACONDA_USER='pytorch'

				export USE_FBGEMM=1

				export JAVA_HOME=$JAVA_HOME

				@ -163,23 +172,6 @@ export BUILD_JNI=$BUILD_JNI

				export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"

				export DOCKER_IMAGE="$DOCKER_IMAGE"

				export workdir="$workdir"

				export MAC_PACKAGE_WORK_DIR="$workdir"

				if [[ "$OSTYPE" == "msys" ]]; then

				  export PYTORCH_ROOT="$workdir/p"

				  export BUILDER_ROOT="$workdir/b"

				else

				  export PYTORCH_ROOT="$workdir/pytorch"

				  export BUILDER_ROOT="$workdir/builder"

				fi

				export MINICONDA_ROOT="$workdir/miniconda"

				export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

				export CIRCLE_TAG="${CIRCLE_TAG:-}"

				export CIRCLE_SHA1="$CIRCLE_SHA1"

				export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				export USE_GOLD_LINKER="${USE_GOLD_LINKER}"

				export USE_GLOO_WITH_OPENSSL="ON"

				@ -187,6 +179,42 @@ export USE_WHOLE_CUDNN="${USE_WHOLE_CUDNN}"

				# =================== The above code will be executed inside Docker container ===================

				EOL

				# nproc doesn't exist on darwin

				if [[ "$(uname)" != Darwin ]]; then

				  # Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization

				  MEMORY_LIMIT_MAX_JOBS=18

				  NUM_CPUS=$(( $(nproc) - 2 ))

				  # Defaults here for **binary** linux builds so they can be changed in one place

				  export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}

				  cat >>"$envfile" <<EOL

				  export MAX_JOBS="${MAX_JOBS}"

				EOL

				fi

				if [[ -z "${IS_GHA:-}" ]]; then

				  cat >>"$envfile" <<EOL

				  export workdir="$workdir"

				  export MAC_PACKAGE_WORK_DIR="$workdir"

				  if [[ "$OSTYPE" == "msys" ]]; then

				    export PYTORCH_ROOT="$workdir/p"

				    export BUILDER_ROOT="$workdir/b"

				  else

				    export PYTORCH_ROOT="$workdir/pytorch"

				    export BUILDER_ROOT="$workdir/builder"

				  fi

				  export MINICONDA_ROOT="$workdir/miniconda"

				  export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

				  export CIRCLE_TAG="${CIRCLE_TAG:-}"

				  export CIRCLE_SHA1="$CIRCLE_SHA1"

				  export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				  export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				  export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				EOL

				fi

				echo 'retry () {' >> "$envfile"

				echo '    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)' >> "$envfile"

				echo '}' >> "$envfile"

									
										4

.circleci/scripts/binary_upload.sh
									
												View File
												
				@ -63,6 +63,10 @@ s3_upload() {

				  )

				}

				# Install dependencies (should be a no-op if previously installed)

				conda install -yq anaconda-client

				pip install -q awscli

				case "${PACKAGE_TYPE}" in

				  conda)

				    conda_upload

									
										45

.circleci/scripts/binary_windows_build.sh
									
												View File
												
				@ -8,15 +8,45 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export USE_SCCACHE=1

				export SCCACHE_BUCKET=ossci-compiler-cache-windows

				export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"

				export VC_YEAR=2019

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then

				    export BUILD_SPLIT_CUDA=ON

				fi

				if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				echo "Free Space for CUDA DEBUG BUILD"

				if [[ "$CIRCLECI" == 'true' ]]; then

				    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft.NET"

				    fi

				    if [[ -d "C:\\Program Files\\dotnet" ]]; then

				        rm -rf "C:\\Program Files\\dotnet"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then

				        rm -rf "C:\\Program Files (x86)\\dotnet"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Xamarin"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Google" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Google"

				    fi

				fi

				set +x

				@ -32,7 +62,8 @@ if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Pac

				fi

				if [[ "$CIRCLECI" == 'true' && -d "C:\\Microsoft" ]]; then

				  rm -rf "C:\\Microsoft\\Android*"

				  # don't use quotes here

				  rm -rf /c/Microsoft/AndroidNDK*

				fi

				echo "Free space on filesystem before build:"

									
										8

.circleci/scripts/binary_windows_test.sh
									
												View File
												
				@ -4,13 +4,7 @@ set -eux -o pipefail

				source "/c/w/env"

				export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export VC_YEAR=2017

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				fi

				export VC_YEAR=2019

				pushd "$BUILDER_ROOT"

									
										34

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -10,18 +10,27 @@ pt_checkout="/var/lib/jenkins/workspace"

				# Since we're cat-ing this file, we need to escape all $'s

				echo "cpp_doc_push_script.sh: Invoked with $*"

				# Argument 1: Where to copy the built documentation for Python API to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"

				# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}

				# the order of operations goes:

				#   1. Check if there's an argument $1

				#   2. If no argument check for environment var DOCS_INSTALL_PATH

				#   3. If no environment var fall back to default 'docs/'

				# NOTE: It might seem weird to gather the second argument before gathering the first argument

				#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to

				#       try and gather it first, just so we don't potentially break people who rely on this script

				# Argument 2: What version of the Python API docs we are building.

				version="${2:-${DOCS_VERSION:-master}}"

				if [ -z "$version" ]; then

				echo "error: cpp_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				# Argument 2: What version of the Python API docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: cpp_doc_push_script.sh: version (arg2) not specified"

				# Argument 1: Where to copy the built documentation for Python API to

				# (pytorch.github.io/$install_path)

				install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"

				if [ -z "$install_path" ]; then

				echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

				@ -56,7 +65,6 @@ cp torch/_utils_internal.py tools/shared

				# Generate PyTorch files

				time python tools/setup_helpers/generate_code.py \

				  --declarations-path build/aten/src/ATen/Declarations.yaml \

				  --native-functions-path aten/src/ATen/native/native_functions.yaml \

				  --nn-path aten/src/

				@ -88,8 +96,12 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true

				git status

				if [[ "${WITH_PUSH:-}" == true ]]; then

				  git push -u origin

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										35

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -13,18 +13,27 @@ echo "python_doc_push_script.sh: Invoked with $*"

				set -ex

				# Argument 1: Where to copy the built documentation to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: python_doc_push_script.sh: install_path (arg1) not specified"

				# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}

				# the order of operations goes:

				#   1. Check if there's an argument $1

				#   2. If no argument check for environment var DOCS_INSTALL_PATH

				#   3. If no environment var fall back to default 'docs/'

				# NOTE: It might seem weird to gather the second argument before gathering the first argument

				#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to

				#       try and gather it first, just so we don't potentially break people who rely on this script

				# Argument 2: What version of the docs we are building.

				version="${2:-${DOCS_VERSION:-master}}"

				if [ -z "$version" ]; then

				echo "error: python_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				# Argument 2: What version of the docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: python_doc_push_script.sh: version (arg2) not specified"

				# Argument 1: Where to copy the built documentation to

				# (pytorch.github.io/$install_path)

				install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"

				if [ -z "$install_path" ]; then

				echo "error: python_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

				@ -34,7 +43,7 @@ if [ "$version" == "master" ]; then

				fi

				# Argument 3: The branch to push to. Usually is "site"

				branch="$3"

				branch="${3:-${DOCS_BRANCH:-site}}"

				if [ -z "$branch" ]; then

				echo "error: python_doc_push_script.sh: branch (arg3) not specified"

				  exit 1

				@ -122,8 +131,12 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true

				git status

				if [[ "${WITH_PUSH:-}" == true ]]; then

				  git push -u origin "${branch}"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										10

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -7,6 +7,9 @@ sudo rm -f /etc/apt/heroku.list

				sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list

				sudo rm -f /etc/apt/partner.list

				# To increase the network reliability, let apt decide which mirror is best to use

				sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list

				retry () {

				  $*  || $* || $* || $* || $*

				}

				@ -29,7 +32,7 @@ if ! command -v aws >/dev/null; then

				fi

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

				@ -40,9 +43,9 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				  curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

				  sudo apt-get update -qq

				  retry sudo apt-get update -qq

				  # Necessary to get the `--gpus` flag to function within docker

				  sudo apt-get install -y nvidia-container-toolkit

				  retry sudo apt-get install -y nvidia-container-toolkit

				  sudo systemctl restart docker

				else

				  # Explicitly remove nvidia docker apt repositories if not building for cuda

				@ -64,6 +67,7 @@ add_to_env_file() {

				}

				add_to_env_file IN_CI 1

				add_to_env_file CI_MASTER "${CI_MASTER:-}"

				add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"

				add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"

				add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"

									
										149

.circleci/scripts/upload_binary_size_to_scuba.py
									
												View File
											
				@ -1,149 +0,0 @@

				import glob

				import json

				import logging

				import os

				import os.path

				import pathlib

				import re

				import sys

				import time

				import zipfile

				import requests

				def get_size(file_dir):

				    try:

				        # we should only expect one file, if no, something is wrong

				        file_name = glob.glob(os.path.join(file_dir, "*"))[0]

				        return os.stat(file_name).st_size

				    except Exception:

				        logging.exception(f"error getting file from: {file_dir}")

				        return 0

				def build_message(size):

				    pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [

				        None,

				        None,

				        None,

				    ]

				    os_name = os.uname()[0].lower()

				    if os_name == "darwin":

				        os_name = "macos"

				    return {

				        "normal": {

				            "os": os_name,

				            "pkg_type": pkg_type,

				            "py_ver": py_ver,

				            "cu_ver": cu_ver,

				            "pr": os.environ.get("CIRCLE_PR_NUMBER"),

				            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				            "sha1": os.environ.get("CIRCLE_SHA1"),

				            "branch": os.environ.get("CIRCLE_BRANCH"),

				            "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),

				        },

				        "int": {

				            "time": int(time.time()),

				            "size": size,

				            "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				            "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),

				        },

				    }

				def send_message(messages):

				    access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")

				    if not access_token:

				        raise ValueError("Can't find access token from environment variable")

				    url = "https://graph.facebook.com/scribe_logs"

				    r = requests.post(

				        url,

				        data={

				            "access_token": access_token,

				            "logs": json.dumps(

				                [

				                    {

				                        "category": "perfpipe_pytorch_binary_size",

				                        "message": json.dumps(message),

				                        "line_escape": False,

				                    }

				                    for message in messages

				                ]

				            ),

				        },

				    )

				    print(r.text)

				    r.raise_for_status()

				def report_android_sizes(file_dir):

				    def gen_sizes():

				        # we should only expect one file, if no, something is wrong

				        aar_files = list(pathlib.Path(file_dir).rglob("pytorch_android-*.aar"))

				        if len(aar_files) != 1:

				            logging.exception(f"error getting aar files from: {file_dir} / {aar_files}")

				            return

				        aar_file = aar_files[0]

				        zf = zipfile.ZipFile(aar_file)

				        for info in zf.infolist():

				            # Scan ".so" libs in `jni` folder. Examples:

				            # jni/arm64-v8a/libfbjni.so

				            # jni/arm64-v8a/libpytorch_jni.so

				            m = re.match(r"^jni/([^/]+)/(.*\.so)$", info.filename)

				            if not m:

				                continue

				            arch, lib = m.groups()

				            # report per architecture library size

				            yield [arch, lib, info.compress_size, info.file_size]

				        # report whole package size

				        yield ["aar", aar_file.name, os.stat(aar_file).st_size, 0]

				    def gen_messages():

				        android_build_type = os.environ.get("ANDROID_BUILD_TYPE")

				        for arch, lib, comp_size, uncomp_size in gen_sizes():

				            print(android_build_type, arch, lib, comp_size, uncomp_size)

				            yield {

				                "normal": {

				                    "os": "android",

				                    # TODO: create dedicated columns

				                    "pkg_type": "{}/{}/{}".format(android_build_type, arch, lib),

				                    "cu_ver": "",  # dummy value for derived field `build_name`

				                    "py_ver": "",  # dummy value for derived field `build_name`

				                    "pr": os.environ.get("CIRCLE_PR_NUMBER"),

				                    "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				                    "sha1": os.environ.get("CIRCLE_SHA1"),

				                    "branch": os.environ.get("CIRCLE_BRANCH"),

				                    "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),

				                },

				                "int": {

				                    "time": int(time.time()),

				                    "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				                    "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),

				                    "size": comp_size,

				                    "raw_size": uncomp_size,

				                },

				            }

				    send_message(list(gen_messages()))

				if __name__ == "__main__":

				    file_dir = os.environ.get(

				        "PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"

				    )

				    if len(sys.argv) == 2:

				        file_dir = sys.argv[1]

				    print("checking dir: " + file_dir)

				    if "-android" in os.environ.get("BUILD_ENVIRONMENT", ""):

				        report_android_sizes(file_dir)

				    else:

				        size = get_size(file_dir)

				        if size != 0:

				            try:

				                send_message([build_message(size)])

				            except Exception:

				                logging.exception("can't send message")

									
										53

.circleci/scripts/vs_install.ps1
									
												View File
												
				@ -1,8 +1,8 @@

				# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479

				# https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers

				# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers

				# 16.8.5 BuildTools

				$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/20130c62-1bc8-43d6-b4f0-c20bb7c79113/145a319d79a83376915d8f855605e152ef5f6fa2b2f1d2dca411fb03722eea72/vs_BuildTools.exe"

				# BuildTools from S3

				$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"

				$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

				$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

				                                                     "--add Microsoft.Component.MSBuild",

				@ -14,32 +14,45 @@ $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStud

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",

				                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")

				curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

				if ($LASTEXITCODE -ne 0) {

				    echo "Download of the VS 2019 Version 16.8.5 installer failed"

				    exit 1

				if (${env:INSTALL_WINDOWS_SDK} -eq "1") {

				    $VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"

				}

				if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {

				    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[16, 17)" -property installationPath

				    if ($existingPath -ne $null) {

				        echo "Found existing BuildTools installation in $existingPath"

				        $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$existingPath`"", "--quiet","--wait")

				        $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru

				        $exitCode = $process.ExitCode

				        if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				            echo "Original BuildTools uninstall failed with code $exitCode"

				            exit 1

				        }

				        echo "Original BuildTools uninstalled"

				    $VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]

				    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath

				    if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {

				        echo "Found correctly versioned existing BuildTools installation in $existingPath"

				        exit 0

				    }

				    $pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath

				}

				echo "Downloading VS installer from S3."

				curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

				if ($LASTEXITCODE -ne 0) {

				    echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"

				    exit 1

				}

				if ($pathToRemove -ne $null) {

				    echo "Uninstalling $pathToRemove."

				    $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")

				    $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru

				    $exitCode = $process.ExitCode

				    if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				        echo "Original BuildTools uninstall failed with code $exitCode"

				        exit 1

				    }

				    echo "Other versioned BuildTools uninstalled."

				}

				echo "Installing Visual Studio version ${env:VS_VERSION}."

				$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

				Remove-Item -Path vs_installer.exe -Force

				$exitCode = $process.ExitCode

				if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."

				    echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."

				    curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe

				    if ($LASTEXITCODE -ne 0) {

				        echo "Download of the VS Collect tool failed."

				@ -47,6 +60,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    }

				    Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru

				    New-Item -Path "C:\w\build-results" -ItemType "directory" -Force

				    Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"

				    Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"

				    exit 1

				}

									
										124

.circleci/scripts/windows_cuda_install.sh
									
												View File
												
				@ -1,70 +1,78 @@

				#!/bin/bash

				set -eux -o pipefail

				cuda_major_version=${CUDA_VERSION%.*}

				if [[ "$cuda_major_version" == "10" ]]; then

				    cuda_installer_name="cuda_10.1.243_426.00_win10"

				    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

				        cuda_installer_name="cuda_11.1.0_456.43_win10"

				        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				case ${CUDA_VERSION} in

				    10.1)

				        cuda_installer_name="cuda_10.1.243_426.00_win10"

				        cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

				        ;;

				    10.2)

				        cuda_installer_name="cuda_10.2.89_441.22_win10"

				        cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"

				        ;;

				    11.1)

				        cuda_installer_name="cuda_11.1.1_456.81_win10"

				        cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

				    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

				        ;;

				    11.3)

				        cuda_installer_name="cuda_11.3.0_465.89_win10"

				        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				        cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"

				    else

				        echo "This should not happen! ABORT."

				        ;;

				    11.5)

				        cuda_installer_name="cuda_11.5.0_496.13_win10"

				        cuda_install_packages="thrust_11.5 nvcc_11.5 cuobjdump_11.5 nvprune_11.5 nvprof_11.5 cupti_11.5 cublas_11.5 cublas_dev_11.5 cudart_11.5 cufft_11.5 cufft_dev_11.5 curand_11.5 curand_dev_11.5 cusolver_11.5 cusolver_dev_11.5 cusparse_11.5 cusparse_dev_11.5 npp_11.5 npp_dev_11.5 nvrtc_11.5 nvrtc_dev_11.5 nvml_dev_11.5"

				        ;;

				    *)

				        echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				        exit 1

				    fi

				        ;;

				esac

				if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then

				    echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"

				else

				    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				    tmp_dir=$(mktemp -d)

				    (

				        # no need to popd after, the subshell shouldn't affect the parent shell

				        pushd "${tmp_dir}"

				        cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

				        curl --retry 3 -kLO $cuda_installer_link

				        7z x ${cuda_installer_name}.exe -o${cuda_installer_name}

				        pushd ${cuda_installer_name}

				        mkdir cuda_install_logs

				        set +e

				        # This breaks for some reason if you quote cuda_install_packages

				        # shellcheck disable=SC2086

				        ./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				        set -e

				        if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then

				            echo "CUDA installation failed"

				            mkdir -p /c/w/build-results

				            7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs

				            exit 1

				        fi

				    )

				    rm -rf "${tmp_dir}"

				fi

				if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then

				    cuda_install_packages="${cuda_install_packages} Display.Driver"

				fi

				cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

				curl --retry 3 -kLO $cuda_installer_link

				7z x ${cuda_installer_name}.exe -o${cuda_installer_name}

				cd ${cuda_installer_name}

				mkdir cuda_install_logs

				set +e

				./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				set -e

				if [[ "${VC_YEAR}" == "2017" ]]; then

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"

				if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then

				    echo "Existing nvtools installation found, skipping install"

				else

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"

				    # create tmp dir for download

				    tmp_dir=$(mktemp -d)

				    (

				        # no need to popd after, the subshell shouldn't affect the parent shell

				        pushd "${tmp_dir}"

				        curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				        7z x NvToolsExt.7z -oNvToolsExt

				        mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				        cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				    )

				    rm -rf "${tmp_dir}"

				fi

				if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"

				then

				    curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				    7z x NvToolsExt.7z -oNvToolsExt

				    mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				    cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				    export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"

				fi

				if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"

				then

				    echo "CUDA installation failed"

				    mkdir -p /c/w/build-results

				    7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs

				    exit 1

				fi

				cd ..

				rm -rf ./${cuda_installer_name}

				rm -f ./${cuda_installer_name}.exe

									
										63

.circleci/scripts/windows_cudnn_install.sh
									
												View File
												
				@ -1,28 +1,49 @@

				#!/bin/bash

				set -eux -o pipefail

				cuda_major_version=${CUDA_VERSION%.*}

				# This is typically blank but for CUDA 10* it'll be set to 10

				windows_version_qualifier=""

				if [[ "$cuda_major_version" == "10" ]]; then

				    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

				        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

				    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

				        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.2.0.53"

				    else

				        echo "This should not happen! ABORT."

				case ${CUDA_VERSION} in

				    10.1)

				        archive_version="v7.6.4.38"

				        windows_version_qualifier="10"

				        ;;

				    10.2)

				        archive_version="v7.6.5.32"

				        windows_version_qualifier="10"

				        ;;

				    11.1)

				        archive_version="v8.0.5.39"

				        ;;

				    11.3)

				        archive_version="v8.2.0.53"

				        ;;

				    11.5)

				        archive_version="v8.2.0.53"

				        ;;

				    *)

				        echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"

				        exit 1

				    fi

				        ;;

				esac

				cudnn_installer_name="cudnn_installer.zip"

				cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/cudnn-${CUDA_VERSION}-windows${windows_version_qualifier}-x64-${archive_version}.zip"

				cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"

				if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then

				    echo "Existing cudnn installation found, skipping install..."

				else

				    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				    tmp_dir=$(mktemp -d)

				    (

				        pushd "${tmp_dir}"

				        curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"

				        7z x "${cudnn_installer_name}" -ocudnn

				        # Use '${var:?}/*' to avoid potentially expanding to '/*'

				        # Remove all of the directories before attempting to copy files

				        rm -rf "${cudnn_install_folder:?}/*"

				        cp -rf cudnn/cuda/* "${cudnn_install_folder}"

				    )

				    rm -rf "${tmp_dir}"

				fi

				cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"

				curl --retry 3 -O $cudnn_installer_link

				7z x ${cudnn_installer_name}.zip -ocudnn

				cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"

				rm -rf cudnn

				rm -f ${cudnn_installer_name}.zip

									
										32

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
												View File
												
				@ -15,31 +15,17 @@ pytorch_params: &pytorch_params

				    build_only:

				      type: string

				      default: ""

				    ci_master:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    BUILD_ONLY: << parameters.build_only >>

				    CI_MASTER: << pipeline.parameters.run_master_build >>

				  resource_class: << parameters.resource_class >>

				pytorch_android_params: &pytorch_android_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    op_list:

				      type: string

				      default: ""

				    lite_interpreter:

				      type: string

				      default: "1"

				  environment:

				    BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single

				    DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				    PYTHON_VERSION: "3.6"

				    SELECTED_OP_LIST: << parameters.op_list >>

				    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

				pytorch_ios_params: &pytorch_ios_params

				  parameters:

				    build_environment:

				@ -60,6 +46,9 @@ pytorch_ios_params: &pytorch_ios_params

				    lite_interpreter:

				      type: string

				      default: "1"

				    use_coreml:

				      type: string

				      default: "0"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				@ -67,6 +56,7 @@ pytorch_ios_params: &pytorch_ios_params

				    SELECTED_OP_LIST: << parameters.op_list >>

				    USE_PYTORCH_METAL: << parameters.use_metal >>

				    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

				    USE_COREML_DELEGATE: << parameters.use_coreml >>

				pytorch_windows_params: &pytorch_windows_params

				  parameters:

				@ -84,7 +74,10 @@ pytorch_windows_params: &pytorch_windows_params

				      default: "10.1"

				    python_version:

				      type: string

				      default: "3.6"

				      default: "3.8"

				    vs_version:

				      type: string

				      default: "16.8.6"

				    vc_version:

				      type: string

				      default: "14.16"

				@ -102,6 +95,7 @@ pytorch_windows_params: &pytorch_windows_params

				    SCCACHE_BUCKET: "ossci-compiler-cache"

				    CUDA_VERSION: <<parameters.cuda_version>>

				    PYTHON_VERSION: <<parameters.python_version>>

				    VS_VERSION: <<parameters.vs_version>>

				    VC_VERSION: <<parameters.vc_version>>

				    VC_YEAR: <<parameters.vc_year>>

				    VC_PRODUCT: <<parameters.vc_product>>

									
										2

.circleci/verbatim-sources/commands.yml
									
												View File
												
				@ -171,4 +171,4 @@ commands:

				            cd ~/project

				            export ANDROID_BUILD_TYPE="<< parameters.build_type >>"

				            export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 .circleci/scripts/upload_binary_size_to_scuba.py android

				            python3 -m tools.stats.upload_binary_size_to_scuba android

									
										3

.circleci/verbatim-sources/header-section.yml
									
												View File
												
				@ -17,6 +17,9 @@ parameters:

				  run_master_build:

				    type: boolean

				    default: false

				  run_slow_gradcheck_build:

				    type: boolean

				    default: false

				executors:

				  windows-with-nvidia-gpu:

									
										9

.circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
				@ -1,3 +1,4 @@

				jobs:

				  binary_linux_build:

				    <<: *binary_linux_build_params

				    steps:

				@ -22,14 +23,14 @@

				        command: |

				            ls -lah /final_pkgs

				    - run:

				        name: save binary size

				        name: upload build & binary data

				        no_output_timeout: "5m"

				        command: |

				            source /env

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 -mpip install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				            python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				@ -239,7 +240,7 @@

				  binary_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				@ -266,7 +267,7 @@

				  binary_ios_upload:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

									
										58

.circleci/verbatim-sources/job-specs/docker_jobs.yml
									
												View File
												
				@ -54,61 +54,3 @@

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

				  docker_for_ecr_gc_build_job:

				      machine:

				        image: ubuntu-2004:202104-01

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_for_ecr_gc

				            no_output_timeout: "1h"

				            command: |

				              cd .circleci/ecr_gc_docker

				              docker build . -t 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				              export AWS_REGION=us-east-1

				              aws ecr get-login-password --region $AWS_REGION|docker login --username AWS \

				                       --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

				              set -x

				              docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/gc/ecr

				  ecr_gc_job:

				      parameters:

				        project:

				          type: string

				          default: "pytorch"

				        tags_to_keep:  # comma separate values

				          type: string

				      environment:

				        PROJECT: << parameters.project >>

				        # TODO: Remove legacy image tags once we feel comfortable with new docker image tags

				        IMAGE_TAG: << parameters.tags_to_keep >>

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - checkout

				        - run:

				            # NOTE: see 'docker_build_job' for how these tags actually get built

				            name: dynamically generate tags to keep

				            no_output_timeout: "1h"

				            command: |

				              GENERATED_IMAGE_TAG=$(\

				                git log --oneline --pretty='%H' .circleci/docker \

				                  | xargs -I '{}' git rev-parse '{}:.circleci/docker' \

				                  | paste -sd "," -)

				              echo "export GENERATED_IMAGE_TAG='${GENERATED_IMAGE_TAG}'" >> ${BASH_ENV}

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              /usr/bin/gc.py --filter-prefix ${PROJECT}  --ignore-tags "${IMAGE_TAG},${GENERATED_IMAGE_TAG}"

									
										140

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -27,7 +27,7 @@

				  pytorch_python_doc_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				@ -41,9 +41,10 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          # turn v1.12.0rc3 into 1.12.0

				          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')

				          target=${tag:-master}

				          echo "building for ${target}"

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				@ -72,7 +73,7 @@

				  pytorch_cpp_doc_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				@ -86,8 +87,10 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          # turn v1.12.0rc3 into 1.12.0

				          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				          echo "building for ${target}"

				@ -126,6 +129,7 @@

				            set -e

				            export IN_CI=1

				            export CROSS_COMPILE_ARM64=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

				@ -162,6 +166,7 @@

				          command: |

				            set -e

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

				@ -198,6 +203,7 @@

				          command: |

				            set -e

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				@ -207,13 +213,15 @@

				          command: |

				            set -ex

				            source /Users/distiller/workspace/miniconda3/bin/activate

				            pip install boto3

				            export PYTHONPATH="$PWD"

				            python3 -m pip install boto3==1.19.12

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Using the same IAM user to write stats to our OSS bucket

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				            python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				          when: always

				      - store_test_results:

				          path: test/test-reports

				@ -235,6 +243,7 @@

				            set -e

				            export IN_CI=1

				            export BUILD_LITE_INTERPRETER=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh

				            unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts

				      - store_test_results:

				@ -244,7 +253,7 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				      PYTHON_VERSION: "3.7"

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				@ -258,7 +267,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32

				          docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64

				@ -333,7 +342,7 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				      PYTHON_VERSION: "3.7"

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				@ -347,7 +356,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

				@ -369,7 +378,7 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      PYTHON_VERSION: "3.6"

				      PYTHON_VERSION: "3.7"

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				@ -384,7 +393,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32

				          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

				          # x86

				@ -407,47 +416,10 @@

				        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				  pytorch_android_gradle_custom_build_single:

				    <<: *pytorch_android_params

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - calculate_docker_image_tag

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle custom build single architecture (for PR)

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:

				          # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;

				          # 2) Not parallelizable by architecture: it only builds libtorch for one architecture;

				          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

				          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Skip docker push as this job is purely for size analysis purpose.

				          # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.

				    - upload_binary_size_for_android_build:

				        build_type: custom-build-single

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				      - checkout

				      - run_brew_for_ios_build

				@ -461,16 +433,17 @@

				            # install fastlane

				            sudo gem install bundler && bundle install

				            # install certificates

				            echo ${IOS_CERT_KEY} >> cert.txt

				            echo ${IOS_CERT_KEY_2022} >> cert.txt

				            base64 --decode cert.txt -o Certificates.p12

				            rm cert.txt

				            bundle exec fastlane install_cert

				            bundle exec fastlane install_root_cert

				            bundle exec fastlane install_dev_cert

				            # install the provisioning profile

				            PROFILE=PyTorch_CI_2021.mobileprovision

				            PROFILE=PyTorch_CI_2022.mobileprovision

				            PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				            mkdir -pv "${PROVISIONING_PROFILES}"

				            cd "${PROVISIONING_PROFILES}"

				            echo ${IOS_SIGN_KEY} >> cert.txt

				            echo ${IOS_SIGN_KEY_2022} >> cert.txt

				            base64 --decode cert.txt -o ${PROFILE}

				            rm cert.txt

				      - run:

				@ -500,7 +473,7 @@

				            # sync submodules

				            cd ${PROJ_ROOT}

				            git submodule sync

				            git submodule update --init --recursive --depth 1

				            git submodule update --init --recursive --depth 1 --jobs 0

				            # export

				            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				@ -511,6 +484,7 @@

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            echo "USE_PYTORCH_METAL": "${USE_METAL}"

				            echo "BUILD_LITE_INTERPRETER": "${BUILD_LITE_INTERPRETER}"

				            echo "USE_COREML_DELEGATE": "${USE_COREML_DELEGATE}"

				            #check the custom build flag

				            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

				@ -519,6 +493,7 @@

				            fi

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              export USE_PYTORCH_METAL=${USE_METAL}

				            fi

				@ -528,12 +503,8 @@

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            if [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

				              echo "Run Build Test is not for full jit, skipping."

				              exit 0

				            fi

				            PROJ_ROOT=/Users/distiller/project

				            PROFILE=PyTorch_CI_2021

				            PROFILE=PyTorch_CI_2022

				            # run the ruby build script

				            if ! [ -x "$(command -v xcodebuild)" ]; then

				              echo 'Error: xcodebuild is not installed.'

				@ -557,21 +528,40 @@

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              echo "not SIMULATOR build, skip it."

				              exit 0

				            elif [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

				              echo "Run Simulator Tests is not for full jit, skipping."

				              exit 0

				            fi

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            pip install torch torchvision --progress-bar off

				            #run unit test

				            # use the pytorch nightly build to generate models

				            pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

				            # generate models for differnet backends

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            python trace_model.py

				            ruby setup.rb

				            mkdir -p ../models

				            if [ ${USE_COREML_DELEGATE} == 1 ]; then

				              pip install coremltools==5.0b5

				              pip install six

				              python coreml_backend.py

				            else

				              python trace_model.py

				            fi

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              echo "Setting up the TestApp for LiteInterpreter"

				              ruby setup.rb --lite 1

				            else

				              echo "Setting up the TestApp for Full JIT"

				              ruby setup.rb

				            fi

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            fastlane scan

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              if [ ${USE_COREML_DELEGATE} == 1 ]; then

				                fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML

				              else

				                fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter

				              fi

				            else

				              fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT

				            fi

				  pytorch_linux_bazel_build:

				    <<: *pytorch_params

				    machine:

				@ -593,7 +583,7 @@

				          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				@ -604,7 +594,7 @@

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Augment our output image name with bazel to avoid collisions

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				            output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				            export COMMIT_DOCKER_IMAGE=$output_image

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				@ -624,7 +614,7 @@

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				          output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=$output_image

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				@ -670,7 +660,7 @@

				  pytorch_doc_test:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-doc-test

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"

				    resource_class: medium

				    machine:

				      image: ubuntu-2004:202104-01

				@ -684,7 +674,7 @@

				        no_output_timeout: "30m"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

									
										378

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
												View File
											
				@ -1,378 +0,0 @@

				jobs:

				  pytorch_linux_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-2004:202104-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - optional_merge_target_branch

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then

				            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"

				            echo 'USE_TBB=1' >> "${BASH_ENV}"

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG}

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "sudo chown -R jenkins workspace && export CIRCLE_JOB="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Copy dist folder back

				          docker cp $id:/var/lib/jenkins/workspace/dist /home/circleci/project/. || echo "Dist folder not found"

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The xla build uses the same docker image as

				            # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				            elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb

				            elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-parallelnative

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32

				            elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-vulkan

				            else

				              export COMMIT_DOCKER_IMAGE=$output_image

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				    - store_artifacts:

				        path: /home/circleci/project/dist

				  pytorch_linux_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-2004:202104-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Download Docker image

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          export PYTHONUNBUFFERED=1

				          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then

				            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"

				          fi

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				          elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-parallelnative

				          elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-vulkan

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"

				            echo 'USE_TBB=1' >> "${BASH_ENV}"

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          # TODO: Make this less painful

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all --shm-size=2g -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          elif [[ ${BUILD_ENVIRONMENT} == *"rocm"* ]]; then

				            hostname

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          echo "id=${id}" >> "${BASH_ENV}"

				    - run:

				        name: Check for no AVX instruction by default

				        no_output_timeout: "20m"

				        command: |

				          set -e

				          is_vanilla_build() {

				            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-bionic-py3.6-clang9-test" ]; then

				              return 0

				            fi

				            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-xenial-py3.6-gcc5.4-test" ]; then

				              return 0

				            fi

				            return 1

				          }

				          if is_vanilla_build; then

				            echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash

				            echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash

				          else

				            echo "Skipping for ${BUILD_ENVIRONMENT}"

				          fi

				    - run:

				        name: Run tests

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          cat >docker_commands.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          export CIRCLE_JOB="$CIRCLE_JOB"

				          ${PARALLEL_FLAGS}

				          cd workspace

				          EOL

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh

				          elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then

				            echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh

				            echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh

				            echo ".jenkins/caffe2/test.sh" >> docker_commands.sh

				          else

				            echo ".jenkins/pytorch/test.sh" >> docker_commands.sh

				          fi

				          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then

				              echo "Retrieving C++ coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then

				              echo "Retrieving Python coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test

				              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test

				              python3 -mpip install codecov

				              python3 -mcodecov

				          fi

				    - run:

				        name: Report results

				        no_output_timeout: "5m"

				        command: |

				          set -e

				          # Retrieving test results should be done as very first step as command never fails

				          # But is always executed if previous step fails for some reason

				          echo "Retrieving test reports"

				          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

				          docker stats --all --no-stream

				          cat >docker_commands.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          export CIRCLE_TAG="${CIRCLE_TAG:-}"

				          export CIRCLE_SHA1="$CIRCLE_SHA1"

				          export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				          export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				          export CIRCLE_JOB="$CIRCLE_JOB"

				          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				          cd workspace

				          export PYTHONPATH="\${PWD}"

				          python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				          EOL

				          echo "(cat docker_commands.sh | docker exec -u jenkins -e LANG=C.UTF-8 -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				        when: always

				    - store_test_results:

				        path: test-reports

				    - store_artifacts:

				        path: test/.coverage

				    - store_artifacts:

				        path: test/coverage.xml

				  pytorch_windows_build:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-xlarge-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - run:

				          name: Install VS2019 toolchain

				          no_output_timeout: 10m

				          command: |

				              powershell .circleci/scripts/vs_install.ps1

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              .circleci/scripts/windows_cuda_install.sh

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Build

				          no_output_timeout: "90m"

				          command: |

				            set -e

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-build.sh

				      - persist_to_workspace:

				          root: "C:/w"

				          paths: build-results

				      - store_artifacts:

				          path: C:/w/build-results

				  pytorch_windows_test:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-medium-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - attach_workspace:

				          at: c:/users/circleci/workspace

				      - run:

				          name: Install VS2019 toolchain

				          no_output_timeout: 10m

				          command: |

				              powershell .circleci/scripts/vs_install.ps1

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				                .circleci/scripts/windows_cuda_install.sh

				              fi

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            export IN_CI=1

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-test.sh

				      - run:

				          name: Report results

				          no_output_timeout: "5m"

				          command: |

				            set -ex

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            export PYTHONPATH="$PWD"

				            pip install typing_extensions boto3

				            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				          when: always

				      - store_test_results:

				          path: test/test-reports

				      - store_artifacts:

				          path: test/coverage.xml

									
										1

.circleci/verbatim-sources/nightly-binary-build-defaults.yml
									
												View File
												
				@ -26,6 +26,7 @@

				# (smoke tests and upload jobs do not need the pytorch repo).

				binary_checkout: &binary_checkout

				  name: Checkout pytorch/builder repo

				  no_output_timeout: "30m"

				  command: .circleci/scripts/binary_checkout.sh

				# Parses circleci arguments in a consistent way, essentially routing to the

									
										34

.circleci/verbatim-sources/workflows/workflows-ecr-gc.yml
									
												View File
											
				@ -1,34 +0,0 @@

				  ecr_gc:

				    triggers:

				      - schedule:

				          cron: "45 * * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_for_ecr_gc_build_job

				      - ecr_gc_job:

				            name: ecr_gc_job_for_pytorch

				            project: pytorch

				            tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026,e43973a9-9d5a-4138-9181-a08a0fc55e2f,8fcf46ef-4a34-480b-a8ee-b0a30a4d3e59,9a3986fa-7ce7-4a36-a001-3c9bef9892e2,1bc00f11-e0f3-4e5c-859f-15937dd938cd,209062ef-ab58-422a-b295-36c4eed6e906,be76e8fd-44e2-484d-b090-07e0cc3a56f0,fff7795428560442086f7b2bb6004b65245dc11a,ab1632df-fa59-40e6-8c23-98e004f61148"

				            requires:

				              - docker_for_ecr_gc_build_job

				      - ecr_gc_job:

				            name: ecr_gc_job_for_caffe2

				            project: caffe2

				            tags_to_keep: "376,373,369,348,345,336,325,324,315,306,301,287,283,276,273,266,253,248,238,230,213"

				            requires:

				              - docker_for_ecr_gc_build_job

				      - ecr_gc_job:

				            name: ecr_gc_job_for_translate

				            project: translate

				            tags_to_keep: "8"

				            requires:

				              - docker_for_ecr_gc_build_job

				      - ecr_gc_job:

				            name: ecr_gc_job_for_tensorcomp

				            project: tensorcomp

				            tags_to_keep: "34"

				            requires:

				              - docker_for_ecr_gc_build_job

									
										195

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml
									
												View File
											
				@ -1,195 +0,0 @@

				  scheduled-ci:

				    triggers:

				      - schedule:

				          # runs every 4 hours on the 45th minute

				          cron: "45 0,4,8,12,16,20 * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_linux_build:

				          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_linux_test:

				          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_test

				          requires:

				            - periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - pytorch_linux_build:

				          name: periodic_libtorch_xenial_cuda11_3_cudnn8_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_windows_build:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          name: periodic_pytorch_windows_cuda11.3_build

				          python_version: "3.6"

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: periodic_pytorch_windows_cuda11.3_test1

				          python_version: "3.6"

				          requires:

				            - periodic_pytorch_windows_cuda11.3_build

				          test_name: pytorch-windows-test1

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: periodic_pytorch_windows_cuda11.3_test2

				          python_version: "3.6"

				          requires:

				            - periodic_pytorch_windows_cuda11.3_build

				          test_name: pytorch-windows-test2

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				  # The following allows these jobs to run on ci-all and release branches

				  debuggable-scheduled-ci:

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_test:

				          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_test

				          requires:

				            - pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_build:

				          name: pytorch_libtorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_build:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          name: pytorch_windows_vs2019_py36_cuda11.3_build

				          python_version: "3.6"

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: pytorch_windows_vs2019_py36_cuda11.3_test1

				          python_version: "3.6"

				          requires:

				            - pytorch_windows_vs2019_py36_cuda11.3_build

				          test_name: pytorch-windows-test1

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: pytorch_windows_vs2019_py36_cuda11.3_test2

				          python_version: "3.6"

				          requires:

				            - pytorch_windows_vs2019_py36_cuda11.3_build

				          test_name: pytorch-windows-test2

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				  # the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables

				  # slow tests and sets an environment variable so gradcheck runs with fast_mode=False

				  slow-gradcheck-scheduled-ci:

				    triggers:

				      - schedule:

				          # runs every 8 hours on the 45th minute

				          cron: "45 0,8,16 * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				      - pytorch_linux_build:

				          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				      - pytorch_linux_test:

				          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_tests

				          requires:

				            - periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-tests"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

6

.clang-tidy

View File

 @ -9,6 +9,7 @@ bugprone-*,
 -bugprone-reserved-identifier,
 cppcoreguidelines-*,
 -cppcoreguidelines-avoid-magic-numbers,
 -cppcoreguidelines-avoid-non-const-global-variables,
 -cppcoreguidelines-interfaces-global-init,
 -cppcoreguidelines-macro-usage,
 -cppcoreguidelines-owning-memory,
 @ -21,6 +22,7 @@ cppcoreguidelines-*,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
 -cppcoreguidelines-non-private-member-variables-in-classes,
 -facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 @ -31,11 +33,13 @@ modernize-*,
 -modernize-use-default-member-init,
 -modernize-use-using,
 -modernize-use-trailing-return-type,
 -modernize-use-nodiscard,
 performance-*,
 -performance-noexcept-move-constructor,
 -performance-unnecessary-value-param,
 '
 HeaderFilterRegex: 'torch/csrc/.*'
 HeaderFilterRegex: 'torch/csrc/(?!deploy/interpreter/cpython).*'
 AnalyzeTemporaryDtors: false
 WarningsAsErrors: '*'
 CheckOptions:
 ...

1

.flake8

View File

 @ -16,7 +16,6 @@ per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
 optional-ascii-coding = True
 exclude =
     ./.git,
     ./build_code_analyzer,
     ./build_test_custom_build,
     ./build,
     ./caffe2,

5

.gitattributes vendored

View File

 @ -1 +1,4 @@
 *.bat	text eol=crlf
 *.bat text eol=crlf
 .circleci/config.yml linguist-generated=true
 .github/workflows/generated-*.yml linguist-generated=true
 .github/generated-* linguist-generated=true

									
										49

.github/ISSUE_TEMPLATE/bug-report.md
									
										vendored
									
												View File
											
				@ -1,49 +0,0 @@

				---

				name: "\U0001F41B Bug Report"

				about: Submit a bug report to help us improve PyTorch

				---

				## 🐛 Bug

				<!-- A clear and concise description of what the bug is. -->

				## To Reproduce

				Steps to reproduce the behavior:

				1.

				1.

				1.

				<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

				## Expected behavior

				<!-- A clear and concise description of what you expected to happen. -->

				## Environment

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				 - PyTorch Version (e.g., 1.0):

				 - OS (e.g., Linux):

				 - How you installed PyTorch (`conda`, `pip`, source):

				 - Build command you used (if compiling from source):

				 - Python version:

				 - CUDA/cuDNN version:

				 - GPU models and configuration:

				 - Any other relevant information:

				## Additional context

				<!-- Add any other context about the problem here. -->

									
										56

.github/ISSUE_TEMPLATE/bug-report.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,56 @@

				name: 🐛 Bug Report

				description: Create a report to help us reproduce and fix the bug

				body:

				- type: markdown

				  attributes:

				    value: >

				      #### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+sort%3Acreated-desc+).

				- type: textarea

				  attributes:

				    label: 🐛 Describe the bug

				    description: |

				      Please provide a clear and concise description of what the bug is.

				      If relevant, add a minimal example so that we can reproduce the error by running the code. It is very important for the snippet to be as succinct (minimal) as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did: avoid any external data, and include the relevant imports, etc. For example:

				      ```python

				      # All necessary imports at the beginning

				      import torch

				      # A succinct reproducing example trimmed down to the essential parts:

				      t = torch.rand(5, 10)  # Note: the bug is here, we should pass requires_grad=True

				      t.sum().backward()

				      ```

				      If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.

				      Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.

				    placeholder: |

				      A clear and concise description of what the bug is.

				      ```python

				      # Sample code to reproduce the problem

				      ```

				      ```

				      The error message you got, with the full traceback.

				      ```

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Versions

				    description: |

				      Please run the following and paste the output below.

				      ```sh

				      wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				      # For security purposes, please check the contents of collect_env.py before running it.

				      python collect_env.py

				      ```

				  validations:

				    required: true

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										39

.github/ISSUE_TEMPLATE/ci-sev.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,39 @@

				---

				name: "⚠️CI SEV"

				about: Tracking incidents for PyTorch's CI infra.

				---

				> NOTE: Remember to label this issue with "`ci: sev`"

				## Current Status

				*Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase)*.

				## Error looks like

				*Provide some way users can tell that this SEV is causing their issue.*

				## Incident timeline (all times pacific)

				*Include when the incident began, when it was detected, mitigated, root caused, and finally closed.*

				<details>

				<summary> Click for example </summary>

				e.g.

				- 10/30 7:27a incident began

				- 10/30 8:30a detected by <method>

				- 10/30 9:00 pm root caused as…

				- 10/30 9:10 pm mitigated by…

				- 10/31 10: am closed by…

				</details>

				## User impact

				*How does this affect users of PyTorch CI?*

				## Root cause

				*What was the root cause of this issue?*

				## Mitigation

				*How did we mitigate the issue?*

				## Prevention/followups

				*How do we prevent issues like this in the future?*

									
										5

.github/ISSUE_TEMPLATE/config.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,5 @@

				blank_issues_enabled: true

				contact_links:

				  - name: Questions

				    url: https://discuss.pytorch.org/

				    about: Ask questions and discuss with other pytorch community members

									
										9

.github/ISSUE_TEMPLATE/documentation.md
									
										vendored
									
												View File
											
				@ -1,9 +0,0 @@

				---

				name: "\U0001F4DA Documentation"

				about: Report an issue related to https://pytorch.org/docs

				---

				## 📚 Documentation

				<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

									
										20

.github/ISSUE_TEMPLATE/documentation.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				name: 📚 Documentation

				description: Report an issue related to https://pytorch.org/docs/stable/index.html

				body:

				- type: textarea

				  attributes:

				    label: 📚 The doc issue

				    description: >

				      A clear and concise description of what content in https://pytorch.org/docs/stable/index.html is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new.

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Suggest a potential alternative/fix

				    description: >

				      Tell us how we could improve the documentation in this regard.

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										24

.github/ISSUE_TEMPLATE/feature-request.md
									
										vendored
									
												View File
											
				@ -1,24 +0,0 @@

				---

				name: "\U0001F680Feature Request"

				about: Submit a proposal/request for a new PyTorch feature

				---

				## 🚀 Feature

				<!-- A clear and concise description of the feature proposal -->

				## Motivation

				<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

				## Pitch

				<!-- A clear and concise description of what you want to happen. -->

				## Alternatives

				<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

				## Additional context

				<!-- Add any other context or screenshots about the feature request here. -->

									
										25

.github/ISSUE_TEMPLATE/feature-request.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,25 @@

				name: 🚀 Feature request

				description: Submit a proposal/request for a new pytorch feature

				body:

				- type: textarea

				  attributes:

				    label: 🚀 The feature, motivation and pitch

				    description: >

				      A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Alternatives

				    description: >

				      A description of any alternative solutions or features you've considered, if any.

				- type: textarea

				  attributes:

				    label: Additional context

				    description: >

				      Add any other context or screenshots about the feature request.

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										13

.github/ISSUE_TEMPLATE/questions-help-support.md
									
										vendored
									
												View File
											
				@ -1,13 +0,0 @@

				---

				name: "❓Questions/Help/Support"

				about: Do you need support? We have resources.

				---

				## ❓ Questions and Help

				### Please note that this issue tracker is not a help form and this issue will be closed.

				We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:

				- [Discussion Forum](https://discuss.pytorch.org/)

2

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

 @ -1 +1 @@
 Fixes #{issue number}
 Fixes #ISSUE_NUMBER

									
										11

.github/actionlint.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				self-hosted-runner:

				  labels:

				    - linux.large

				    - linux.2xlarge

				    - linux.4xlarge

				    - linux.4xlarge.nvidia.gpu

				    - linux.8xlarge.nvidia.gpu

				    - linux.16xlarge.nvidia.gpu

				    - windows.4xlarge

				    - windows.8xlarge.nvidia.gpu

				    - bm-runner

									
										266

.github/generated-ciflow-ruleset.json
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,266 @@

				{

				  "__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",

				  "label_rules": {

				    "ciflow/all": [

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "docker-builds",

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-cuda11.3-py3.7-gcc7-no-ops",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "linux-xenial-py3.7-gcc7-no-ops",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/android": [

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"

				    ],

				    "ciflow/bazel": [

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test"

				    ],

				    "ciflow/binaries": [

				      "linux-binary-conda",

				      "linux-binary-libtorch-cxx11-abi",

				      "linux-binary-libtorch-pre-cxx11",

				      "linux-binary-manywheel"

				    ],

				    "ciflow/binaries/conda": [

				      "linux-binary-conda"

				    ],

				    "ciflow/binaries/libtorch": [

				      "linux-binary-libtorch-cxx11-abi",

				      "linux-binary-libtorch-pre-cxx11"

				    ],

				    "ciflow/binaries/wheel": [

				      "linux-binary-manywheel"

				    ],

				    "ciflow/cpu": [

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "linux-xenial-py3.7-gcc7-no-ops",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3"

				    ],

				    "ciflow/cuda": [

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-no-ops",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/default": [

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "linux-xenial-py3.7-gcc7-no-ops",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/docs": [

				      "linux-docs"

				    ],

				    "ciflow/ios": [

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit"

				    ],

				    "ciflow/libtorch": [

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7"

				    ],

				    "ciflow/linux": [

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-cuda11.3-py3.7-gcc7-no-ops",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "linux-xenial-py3.7-gcc7-no-ops",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"

				    ],

				    "ciflow/macos": [

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64"

				    ],

				    "ciflow/mobile": [

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static"

				    ],

				    "ciflow/noarch": [

				      "linux-bionic-py3.7-clang9"

				    ],

				    "ciflow/onnx": [

				      "linux-xenial-py3.7-clang7-onnx"

				    ],

				    "ciflow/sanitizers": [

				      "linux-xenial-py3.7-clang7-asan"

				    ],

				    "ciflow/scheduled": [

				      "linux-docs-push",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3"

				    ],

				    "ciflow/slow": [

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"

				    ],

				    "ciflow/slow-gradcheck": [

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"

				    ],

				    "ciflow/trunk": [

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "docker-builds",

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-cuda11.3-py3.7-gcc7-no-ops",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "linux-xenial-py3.7-gcc7-no-ops",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/vulkan": [

				      "linux-vulkan-bionic-py3.7-clang9"

				    ],

				    "ciflow/win": [

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/xla": [

				      "linux-bionic-py3.7-clang9"

				    ]

				  },

				  "version": "v1"

				}

Compare commits

6248 Commits v1.9.1-rc2 ... nightly-bi

2 .azure_pipelines/job_templates/build-verify-publish-template-unix.yml Unescape Escape View File

2 .azure_pipelines/job_templates/build-verify-publish-template-win.yml Unescape Escape View File

26 .azure_pipelines/job_templates/notify-webapp-template.yml Normal file Unescape Escape View File

2 .azure_pipelines/job_templates/prepare-build-template.yml Unescape Escape View File

12 .azure_pipelines/job_templates/pytorch-template-unix.yml Unescape Escape View File

8 .azure_pipelines/job_templates/pytorch-template-win.yml Unescape Escape View File

4 .azure_pipelines/job_templates/set-environment-variables.yml Unescape Escape View File

2 .azure_pipelines/job_templates/wheel-wait-job-template.yml Unescape Escape View File

55 .azure_pipelines/job_templates/wheel-wait-template.yml Unescape Escape View File

10 .azure_pipelines/nightly-pytorch-tests-pipeline.yml Unescape Escape View File

36 .azure_pipelines/pytorch-tests-pipeline.yml Unescape Escape View File

16 .bazelrc Unescape Escape View File

2 .bazelversion Unescape Escape View File

1 .circleci/README.md Unescape Escape View File

18 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

6 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

11 .circleci/cimodel/data/dimensions.py Unescape Escape View File

111 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

85 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

119 .circleci/cimodel/data/simple/android_definitions.py Unescape Escape View File

10 .circleci/cimodel/data/simple/binary_smoketest.py Unescape Escape View File

37 .circleci/cimodel/data/simple/docker_definitions.py Unescape Escape View File

78 .circleci/cimodel/data/simple/ge_config_tests.py Unescape Escape View File

8 .circleci/cimodel/data/simple/ios_definitions.py Unescape Escape View File

33 .circleci/cimodel/data/simple/mobile_definitions.py Unescape Escape View File

77 .circleci/cimodel/data/simple/nightly_android.py Unescape Escape View File

23 .circleci/cimodel/data/simple/nightly_ios.py Unescape Escape View File

4 .circleci/cimodel/data/simple/util/docker_constants.py Unescape Escape View File

164 .circleci/cimodel/data/windows_build_definitions.py Unescape Escape View File

6735 .circleci/config.yml generated View File

2 .circleci/docker/README.md Unescape Escape View File

4 .circleci/docker/android/build.gradle Unescape Escape View File

162 .circleci/docker/build.sh Unescape Escape View File

15 .circleci/docker/build_docker.sh Unescape Escape View File

12 .circleci/docker/centos-rocm/Dockerfile Unescape Escape View File

18 .circleci/docker/common/install_base.sh Unescape Escape View File

19 .circleci/docker/common/install_breakpad.sh Unescape Escape View File

3 .circleci/docker/common/install_cmake.sh Unescape Escape View File

36 .circleci/docker/common/install_conda.sh Unescape Escape View File

10 .circleci/docker/common/install_cudnn8.sh Executable file Unescape Escape View File

17 .circleci/docker/common/install_db.sh Unescape Escape View File

11 .circleci/docker/common/install_gcc.sh Unescape Escape View File

4 .circleci/docker/common/install_nccl.sh Unescape Escape View File

6 .circleci/docker/common/install_openmpi.sh Unescape Escape View File

6 .circleci/docker/common/install_openssl.sh Unescape Escape View File

27 .circleci/docker/common/install_protobuf.sh Unescape Escape View File

49 .circleci/docker/common/install_rocm.sh Unescape Escape View File

7 .circleci/docker/common/install_tensorrt.sh Normal file Unescape Escape View File

17 .circleci/docker/common/install_vision.sh Unescape Escape View File

33 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

6 .circleci/docker/ubuntu-rocm/Dockerfile Unescape Escape View File

9 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

13 .circleci/ecr_gc_docker/Dockerfile Unescape Escape View File

125 .circleci/ecr_gc_docker/docker_hub.py Unescape Escape View File

218 .circleci/ecr_gc_docker/gc.py Unescape Escape View File

3 .circleci/ecr_gc_docker/requirements.txt Unescape Escape View File

61 .circleci/generate_config_yml.py Unescape Escape View File

3 .circleci/scripts/binary_checkout.sh Unescape Escape View File

6 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

11 .circleci/scripts/binary_ios_test.sh Unescape Escape View File

31 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

14 .circleci/scripts/binary_linux_build.sh Unescape Escape View File

35 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

4 .circleci/scripts/binary_macos_build.sh Unescape Escape View File

132 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

4 .circleci/scripts/binary_upload.sh Unescape Escape View File

45 .circleci/scripts/binary_windows_build.sh Unescape Escape View File

8 .circleci/scripts/binary_windows_test.sh Unescape Escape View File

34 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

35 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

10 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

149 .circleci/scripts/upload_binary_size_to_scuba.py Unescape Escape View File

53 .circleci/scripts/vs_install.ps1 Unescape Escape View File

124 .circleci/scripts/windows_cuda_install.sh Unescape Escape View File

63 .circleci/scripts/windows_cudnn_install.sh Unescape Escape View File

32 .circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Unescape Escape View File

2 .circleci/verbatim-sources/commands.yml Unescape Escape View File

3 .circleci/verbatim-sources/header-section.yml Unescape Escape View File

6248 Commits

v1.9.1-rc2 ... nightly-bi

2

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml

View File

2

.azure_pipelines/job_templates/build-verify-publish-template-win.yml

View File

26

.azure_pipelines/job_templates/notify-webapp-template.yml Normal file

View File

2

.azure_pipelines/job_templates/prepare-build-template.yml

View File

12

.azure_pipelines/job_templates/pytorch-template-unix.yml

View File

8

.azure_pipelines/job_templates/pytorch-template-win.yml

View File

4

.azure_pipelines/job_templates/set-environment-variables.yml

View File

2

.azure_pipelines/job_templates/wheel-wait-job-template.yml

View File

55

.azure_pipelines/job_templates/wheel-wait-template.yml

View File

10

.azure_pipelines/nightly-pytorch-tests-pipeline.yml

View File

36

.azure_pipelines/pytorch-tests-pipeline.yml

View File

16

.bazelrc

View File

2

.bazelversion

View File

1

.circleci/README.md

View File

18

.circleci/cimodel/data/binary_build_data.py

View File

6

.circleci/cimodel/data/binary_build_definitions.py

View File

11

.circleci/cimodel/data/dimensions.py

View File

111

.circleci/cimodel/data/pytorch_build_data.py

View File

85

.circleci/cimodel/data/pytorch_build_definitions.py

View File

119

.circleci/cimodel/data/simple/android_definitions.py

View File

10

.circleci/cimodel/data/simple/binary_smoketest.py

View File

37

.circleci/cimodel/data/simple/docker_definitions.py

View File

78

.circleci/cimodel/data/simple/ge_config_tests.py

View File

8

.circleci/cimodel/data/simple/ios_definitions.py

View File

33

.circleci/cimodel/data/simple/mobile_definitions.py

View File

77

.circleci/cimodel/data/simple/nightly_android.py

View File

23

.circleci/cimodel/data/simple/nightly_ios.py

View File

4

.circleci/cimodel/data/simple/util/docker_constants.py

View File

164

.circleci/cimodel/data/windows_build_definitions.py

View File

6735

.circleci/config.yml generated

View File

2

.circleci/docker/README.md

View File

4

.circleci/docker/android/build.gradle

View File

162

.circleci/docker/build.sh

View File

15

.circleci/docker/build_docker.sh

View File

12

.circleci/docker/centos-rocm/Dockerfile

View File

18

.circleci/docker/common/install_base.sh

View File

19

.circleci/docker/common/install_breakpad.sh

View File

3

.circleci/docker/common/install_cmake.sh

View File

36

.circleci/docker/common/install_conda.sh

View File

10

.circleci/docker/common/install_cudnn8.sh Executable file

View File

17

.circleci/docker/common/install_db.sh

View File

11

.circleci/docker/common/install_gcc.sh

View File

4

.circleci/docker/common/install_nccl.sh

View File

6

.circleci/docker/common/install_openmpi.sh

View File

6

.circleci/docker/common/install_openssl.sh

View File

27

.circleci/docker/common/install_protobuf.sh

View File

49

.circleci/docker/common/install_rocm.sh

View File

7

.circleci/docker/common/install_tensorrt.sh Normal file

View File

17

.circleci/docker/common/install_vision.sh

View File

33

.circleci/docker/ubuntu-cuda/Dockerfile

View File

6

.circleci/docker/ubuntu-rocm/Dockerfile

View File

9

.circleci/docker/ubuntu/Dockerfile

View File

13

.circleci/ecr_gc_docker/Dockerfile

View File

125

.circleci/ecr_gc_docker/docker_hub.py

View File

218

.circleci/ecr_gc_docker/gc.py

View File

3

.circleci/ecr_gc_docker/requirements.txt

View File

61

.circleci/generate_config_yml.py

View File

3

.circleci/scripts/binary_checkout.sh

View File

6

.circleci/scripts/binary_ios_build.sh

View File

11

.circleci/scripts/binary_ios_test.sh

View File

31

.circleci/scripts/binary_ios_upload.sh

View File

14

.circleci/scripts/binary_linux_build.sh

View File

35

.circleci/scripts/binary_linux_test.sh

View File

4

.circleci/scripts/binary_macos_build.sh

View File

132

.circleci/scripts/binary_populate_env.sh

View File

4

.circleci/scripts/binary_upload.sh

View File

45

.circleci/scripts/binary_windows_build.sh

View File

8

.circleci/scripts/binary_windows_test.sh

View File

34

.circleci/scripts/cpp_doc_push_script.sh

View File

35

.circleci/scripts/python_doc_push_script.sh

View File

10

.circleci/scripts/setup_ci_environment.sh

View File

149

.circleci/scripts/upload_binary_size_to_scuba.py

View File

53

.circleci/scripts/vs_install.ps1

View File

124

.circleci/scripts/windows_cuda_install.sh

View File

63

.circleci/scripts/windows_cudnn_install.sh

View File

32

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml

View File

2

.circleci/verbatim-sources/commands.yml

View File

3

.circleci/verbatim-sources/header-section.yml

View File

9

.circleci/verbatim-sources/job-specs/binary-job-specs.yml

View File